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I.  INTRODUCTION 


On*  of  thm  pr  obi  mm  currently  being  studied  by  th* 
Naval  Research  Laboratory  (NRL)  involvas  th*  processing  of 
fraquant  and  coaplax  messages  froa  satal li taa.  The 
procaasing  of  thasa  aaasagas  requires  a  high  percentage  of 
bit  aanipulations  which  uses  a  large  aaount  of  central 
processing  unit  (CPU;  ties.  Tha  currently  available 
computers  do  not  have  sufficient  capability  to  perform  this 
processing  in  a  tiaaly  Banner.  There  are  several  options 
available  to  the  NRL  for  improving  the  situation.  One 
option  is  tha  use  of  a  very  fast  computer,  however,  the 
cost  of  such  a  computer  is  vary  high.  Th*  purpose  of  this 
project  is  to  evaluate  another  less  costly  option  using  an 
automatic  microcode  generating  system  (AMGS). 

JRS  Research  Laboratories  Inc.  has  developed  an  AMGS 
which  generates  microcode  for  the  writeable  control  store 
(WCS)  on  the  VAX  11/780.  The  JRS  AMGS  was  developed  to 
provide  a  low  cost  technique  for  algorithm  implementation 
which  provides  the  performance  of  microcode,  yet  does  not 
require  detailed  machine  level  microprogramming.  The  JRS 
AMGS  is  a  software  package  that  generates  microcode  from  a 
high  level  language  (HLL) ,  thereby  eliminating  the  need  for 
the  programmer  to  be  concerned  with  the  details  of 
microcode.  The  user,  therefore,  need  not  understand 


Microcode  prograMlng  and  may  apply  thm  principlms  of 
soft war •  mnginmmring  through  thm  usm  of  an  HLL.  Figurs  1-1 
CRef.  1:  p.  539]  shows  thm  stops  involvad  in  gmnmrating 
microcods  from  thm  HLL  using  thm  AMGS.  It  is  important  to 
nota  whors  thm  AMGS  is  machins  indopsndont  and  where  it  is 


machine  dependent. 


This  will  be  important  in  later 


discussions  of  the  system. 

Since  the  target  machine  of  the  JRS  AMGS,  the  VAX 
11/780,  is  a  horizontally  mi croprogr ammed  processor,  it  is 
capable  of  executing  a  number  of  operations  simultaneously. 
This  is  the  key  ingredient  to  improving  the  speed  and 
efficiency  of  the  executable  code  because  several 
microoperations  may  be  executed  concurrently.  CRef.  1:  pp. 
558-5593  By  applying  the  JRS  AMGS  to  the  data  manipulation 
requirements  of  the  satellite  communication  problem,  a 
reduction  in  required  CPU  time  should  be  achieved. 

Since  the  current  method  used  by  NRL  for  implementing 
the  algorithms  is  to  write  them  in  Fortran,  this  project 
will  compare  the  execution  speed  attained  using  the  AMGS  to 
the  execution  speed  attained  using  Fortran  code.  The 
results  of  this  comparison  will  provide  an  understanding  of 
the  type  of  algorithms  that  are  suitable  for  implementation 
via  the  JRS  AMGS,  the  performance  improvements  to  these 
algorithms,  and  the  costs  of  using  this  implementation 
technique.  This  study  is  based  on  two  aspects  of  computer 
science:  microprogramming  and  computer  performance 
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•valuation.  Dafor»  the  study  can  be  described,  thasa  two 
araas  aust  ba  dafinad. 

Wilkes  dafinad  aicroprograaaing  as  a  aathod  of 
iapl assn ting  tha  control  function  of  a  coaputar.  CRef.  2: 
p.  991  Tha  eajor  advantages  of  ai croprogr aaai ng  are: 

1)  Low  Cost:  Microprograaaing  allows  largo  instruction 
sats  to  ba  iaplaaantad  at  a  low  cost  because  of  tha 
siapla  design  process.  Developing  a  hardwired  design  of 
an  equivalent  systoa  would  ba  vary  expansive. 

2)  Flexibility:  With  aicroprograaaing,  it  is  possible  to 
change  tha  instruction  sat  or  to  introduce  a  new  sat 
after  iap lamentation.  This  aay  allow  a  coaputar  to  ba 
useful  for  eany  aoro  years  than  originally  planned. 

3)  Simplicity:  Microprograaaing  allows  for  siaplar 

development  due  to  tha  decrease  in  internal  circuitry. 
This  siaplar  design  facilitates  maintenance  and  reduces 
the  problems  associated  with  upgrading  tha  design  in  tha 
field. 

4)  Spaed:  Although  aicroprograaaing  is  slower  than  soaa 
hardwired  designs,  a  microprogrammed  implementation  will 
run  faster  on  most  algorithms  than  an  equivalent  machine 
language  implementation.  This  is  due  to  the  machine 
language  fetch  and  decode  overhead.  CRef.  3:  p.  53 

A  major  disadvantage  of  microprogramming  is  the  memory 
delay  penalty  for  fetching  each  microinstruction  from  the 
control  store.  This  fetch  penalty  can  result  in  slow 


execution  tises  if  not  dealt  with  properly,  but  tho  problea 
can  bo  wade  loss  significant  by  providing  an  overlap 
between  the  fetch  and  the  execute  portions  of 
eicroi instructions.  CRef.  3:  p.  53 

Since  this  project  is  concerned  with  coeparing  Fortran 
code  with  aicrocode,  it  is  i sport ant  to  review  the 
tradeoffs  between  using  Fortran  (or  soae  other  HLL)  and 
aicrocode.  Progressing  in  aicrocode  is  very  tedious  and 
cosplex  because  the  prograsser  oust  deal  with  the  details 
of  the  sachine.  However,  it  is  this  cosplexity  of 
Microcode  that  can,  through  proper  prograsming,  lead  to  a 
speed  advantage.  On  the  other  hand,  Fortran  and  sisilar 
HLLs  are  not  nearly  as  cosplex  because  the  details  of  the 
sachine  are  handled  by  the  cospiler.  The  slower  execution 
speed  for  HLL'*  results  fros  both  the  generalization 
required  in  the  code  generation  portion  of  the  cospiler  and 
the  instruction  fetch  decode  penalty  described  in  the 
explanation  of  the  aicrocode  speed  advantage. 

Hi cr opr ogr easing,  in  its  present  state,  say  be  used  to 
provide  efficient  isplesentatians  of  the  control  function 
an  coeputers.  While  not  providing  the  fastest  execution 
speed  possible,  sicrocoding  will  provide  a  given  level  of 
throughput  at  a  cheaper  price  than  is  otherwise  possible. 
In  addition,  the  speed  of  aicrocode  has  recently  isproved 
because  of  the  devel opsent  of  fast,  inexpensive 
sesi conductor  sesories.  These  are  the  two  sain  reasons  to 
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wiggMt  that  tha  AN98  can  givm  a  parforaanca  advantaga  ovar 
Fortran  aourco  coda. 


Since  this  study  involves  the  evaluation  of  the 
performance  of  Microcode,  it  is  iaportant  to  review  the 
relevant  performance  evaluation  techniques,  methods,  and 
problems.  The  performance  evaluation  in  this  study  is  a 
coeparison  between  different  iepleesntations  of  the  sane 
algorithn.  The  classic  application  of  perforaance 
evaluations  is  on  operating  systems  to  detereine  how  to 
inprove  the  systee.  To  achieve  the  coMparison,  the 
evaluators  must  define  a  benchmark  which  represents  the 
type  of  workload  that  occurs  on  that  systee.  Defining  this 
workload  properly  and  accurately  is  very  iaportant  or  the 
results  will  be  invalid.  Zn  the  case  of  this  study,  a 
Major  consideration  is  the  definition  of  the  algorithms  to 
be  iapleaentsd  and  compared. 

One  option  when  picking  the  algorithms  is  to  choose  a 
very  specific  application  area  and  test  only  within  that 
area.  Another  option  is  to  attempt  to  test  the  entire 
realm  of  possible  applications  which  would  take  many 
different  algorithms.  Zn  Chapter  Three,  the  application 
areas  of  interest  will  be  defined  and  the  subsets  of  these 
areas  to  be  tested  will  be  identified.  The  tests  will  be 
as  comprehensive  as  possible  and  will  cover  as  large  an 
area  as  possible,  however,  exhaustive  testing  of  the  entire 
realm  of  applications  is  not  possible. 
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The  •valuation  technique  Mill  consist  of  iwplsssntinq 
the  algor it ha  in  both  Fortran  and  in  the  AMOS  HLL.  Both 
the  Fortran  and  ths  HLL  codas  Mill  be  executed,  timed,  and 


the  execution  times  Mill  ba  coaparad  to  dataraina  the 

change  in  performance  with  the  HLL  aicrocoda  version. 

Since  the  algorithas  are  grouped  according  to  application, 
it  is  possible  to  dataraina  Mhich  applications  have 
increased  throughput  from  use  of  the  AM6S. 

Several  contributing  factors  aust  ba  considered  during 
this  parforaanca  evaluation.  The  affect  of  using  an  HLL 
instead  of  assaably  language  or  direct  ai cr opr ogr ammi ng  to 
imp  lenient  the  aicrocoda  version  is  significant  because  of 
the  costs  involved  in  using  each  method.  Similar  costs  are 

associated  Mith  all  HLLs,  be  it  the  JRS  HLL  and  its 

associated  compiler  or  Fortran  and  its  resultant 

translation.  Likewise  looking  at  the  more  primitive 
languages  of  assembly  code  and  microcode,  the  costs  of 

programming  in  both  languages  are  very  similar.  However, 
not  so  obvious  is  the  tradeoff  involved  in  choosing  one 
type  of  language  (high  level  versus  low  level)  over 
another.  The  specific  compilation  techniques  may  also  have 
an  effect  on  the  efficiency  of  the  product  and  must  be 
considered.  The  microcode  compaction  method  used  will 
certainly  affect  how  fast  the  microcode  executes.  A 

performance  evaluation  must  analyze  many  factors,  both 

individually  and  combined,  to  produce  valid  results. 
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Chiptir  Two  is  s  discussion  of  ths  issues  of  AMQS 
design*  coepiler  technique*  code  generation*  and  code 
aptieization.  The  purpose  of  this  discussion  is  to  assess 
the  effect  each  itee  has  on  the  entire  systee  so  that 
during  the  evaluation  of  the  AHBS  these  factors  can  be 
properly  analyzed. 

Chapter  Three  describes  how  each  prograe  was  generated. 
Because  the  AM8S  is  bsing  evaluated  for  all  applications, 
this  chapter  defines  the  basic  areas  that  are  tested  in  the 
project.  After  the  basic  areas  are  defined*  the  specific 
tests  developed  to  cover  these  areas  are  explained  and  the 
information  to  be  gained  from  each  test  is  outlined. 

Chapter  Four  explains  the  mechanics  of  the  testing 
including  the  timing  mechanism  and  the  effects  of  language 
features  on  the  tests.  An  analysis  of  possible  sources  of 
errors  is  also  included  here  to  explain  the  validity  of  the 
results. 

Chapter  Five  compares  the  data  from  the  tests  and 
analyzes  the  results.  A  step-by-step  explanation  of  the 
testing  is  enumerated  to  insure  a  proper  understanding  of 
why  certain  tests  were  accompl ished.  The  last  chapter 
summarizes  the  results  and  makes  deductions  and 
recommendations  for  further  research  in  this  area. 


II.  BACKBROUND 

With  the  Many  advantage*  of  ni croproqr aaiwd  computers 
there  is  no  apparent  reason  why  microprogramming  should  not 
be  used  for  eost  applications.  Low  cost*  -fast  execution 
time*  and  sieplicity  of  design  sound  like  exactly  what  a 
computer  designer  desires.  There  is*  however*  the  problem 
of  developing  the  microprograms  (commonly  called  firmware) 
for  the  computer  in  a  reasonable  amount  of  time  and  with 
reasonable  cost.  Developing  firmware  has  been  a  costly* 
error  prone*  and  slow  process  because  it  has  been  done 
manually  and  because  of  the  details  that  must  be  handled  by 
the  mi cr oprogr ammer . 

The  obvious  answer  is  to  eliminate  the  use  of  low  level 
languages  and  place  the  mi croprogr ammer  into  the  world  of 
high  level  languages.  That  is  the  intent  of  the  AMGS. 
However,  along  with  the  advantages  of  an  HLL  come  problems 
and  considerations  that  cannot  be  ignored.  This  chapter 
will  explore  the  many  considerations  of  the  AM6S  and 
discuss  their  impact  on  the  JRS  AMGS.  The  following  topics 
are  considered  to  be  the  most  relevant  and  will  be 
discussed  in  depth  in  this  chapter:  1)  High  Level  Languages 
and  Microprogramming*  2>  Machine  Independence*  3) 
Compaction  and  Optimization*  4)  JRS  AMGS  Limitations*  and 
S)  Performance  Evaluation  Methodology. 
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A.  HIGH  LEVEL  LANGUAGES  AND  MICROPROGRAMMING 


Higher  level  languages  are  designed  to  simplify 
progressing  by  isolating  the  prograsser  from  the  details  of 
the  sachine  and  placing  his  at  a  higher  level  of 
abstraction.  An  AtW3S  removes  the  prograsser  fros  the 
details  of  si cr opr ogr easing  and  allows  the  prograsser  to 
erite  the  code  in  an  HLL.  Writing  a  progras  in  an  HLL 
takes  such  less  ties  than  writing  the  ease  progras  in 
sicrocode  because  the  prograsser  aust  deal  with  fewer 
details.  There  are  many  studies  that  have  shown  the 
advantages  of  using  HLLs  instead  of  assesblycode.  One  such 
study  claias  that  a  pr ogr  maser  produces  a  set  number  of 
lines  of  code  per  day,  independent  of  the  type  of  code. 
Since  one  line  of  HLL  code  will  produce  many  lines  of 
microcode,  it  is  logical  to  opt  for  the  HLL  if  all  other 
factors  are  equal.  CRef.  4:  p.  1453 

However,  all  other  factors  are  not  equal.  Since  the 
HLL  used  for  generating  the  sicrocode  is  a  special  purpose 
language,  any  progras  written  to  use  this  system  must  be 
translated  fros  another  language  before  it  can  be  used.  In 
this  particular  case,  the  time  required  to  program  in  the 
HLL  provided  by  JRS  must  be  considered.  Of  course  the  time 
required  to  translate  the  algorithm  to  the  HLL  should  be 
much  less  than  would  be  required  to  translate  the  algorithm 
into  assembly  language  code  or  into  sicrocode.  Since  the 
JRS  HLL  is  a  language  heavily  influenced  by  the  block 


structure  of  Algol,  Fortran,  and  Pascal,  any  algori tha 
Mritton  in  a  block  structured  language  should  be  easy  to 
translate  to  the  JRS  HLL. 

Software  engineering  literature  provides  eany  reasons 
for  using  HLLs.  One  such  reason  is  that  HLLs  provide 
security  not  possible  using  microcode  or  assemblycode. 
Farced  typing  of  variables  is  one  example  of  the  security 
provided  by  HLLs.  High  level  languages  also  provide 
features  such  as  subprograms  which  are  an  advantage  because 
they  assist  the  user  in  subdividing  the  program  into 
logical  units.  These  logical  units  make  the  problem  easier 
to  understand  and  handle. 

The  chief  advantage  of  an  HLL  is  the  ease  of  program 
maintenance  which  results  in  lower  life  cycle  costs  for  a 
program.  Program  changes  can  be  very  expansive  if  the 
programmer  must  read  and  understand  low  level  code,  with  or 
without  good  documentation.  Through  use  of  an  HLL,  program 
changes  can  be  made  much  more  quickly  and  simply,  with 
reduced  costs. 

The  advantages  of  HLLs  are  all  'nice*  for  the 
programmer,  but  it  is  important  to  consider  the 
si de-ef facts  of  HLL  usage.  If  the  advantages  of  an  HLL 
detract  from  the  advantages  of  microprogramming  (i.e.- 
simplicity,  cost,  flexibility,  speed)  then  using  the  AMGS 
may  not  be  justified.  On  the  other  hand,  if  the  AMGS 
eliminates  or  minimixes  other  problems  of  microcode,  then 


the  AMBS  Mill  b< 


•van 


•  desirable 


On*  such 


undesirable  property  of  Microcode  is  its  Machine 
dependence.  In  the  next  section  m*  Mill  look  at  the  effect 
of  using  an  HLL  on  the  Machine  dependence  of  the  resulting 
Microcode. 

B.  MACHINE  INDEPENDENCE 

Machine  independence  is  a  Major  concern  during  the 
devel opeent  of  an  AM6S  because  of  the  desire  to  Make 
firmeare  portable.  The  AMBS  is  a  tool  used  to  help  achieve 
the  goals  of  Machine  independence  and  portability.  Machine 
independence  and  portability  are  desirable  characteristics 
for  any  computer  language  because  if  the  code  may  be  used 
on  more  than  one  computer,  the  overall  f irmware  development 
costs  Mill  be  lower.  If  every  different  target  machine 
must  have  its  own  version  of  the  algorithm  eritten 
specifically  for  it,  then  the  cost  of  program  development 
Mill  be  a  function  of  the  number  of  target  machines.  A 
much  more  desirable  method  is  to  write  one  program  that  may 
be  used  on  every  machine  resulting  in  only  one  program 
being  developed. 

A  machine  dependent  language  is  "a  language  in  which 
all  operations  and  data  elements  defined  in  the  language 
have  a  direct  mapping  to  a  resource  of  the  target  machine." 
CRef.  5:  p.  1943  The  actual  microcode  is  such  a  language 
because  it  specifically  addresses  the  available  registers 

20 


w  wv  *  V 


i  a  «  w  *  .  *  « 


of  a  machine.  To  riMin  MChine  independent  and  avoid  tha 
pr  obi  mm  of  oachino  dependence,  a  language  oust  avoid  tho 
details  of  the  target  eachine  and  retain  general  enough  to 
not  becoee  tied  to  any  specific  instruction  set.  This  can 
be  accoeplished  by  defining  an  overall  class  of  machines 
and  then  writing  the  language  to  fit  into  that  definition. 
Such  a  definition  includes  such  items  as  the  minimum  number 
of  registers,  the  minimum  stack  size,  and  other  hardware 
related  items.  Capitalizing  on  the  similarities  and 

avoiding  the  differences  of  the  machines  in  the  class 
simplifies  this  task.  Any  item  that  is  not  common  to  all 
machines  in  the  class  must  not  be  included  in  the 

definition  because  it  can  not  be  supported  by  all  machines 
in  the  class. 

The  AMGS  supports  Mchine  independence  and  portability 
of  microcode  by  providing  an  intermediate  language  and  an 
HLL  that  avoids  the  direct  mapping  to  the  machine 

resources.  Since  these  two  components  of  the  AMGS  are 

machine  independent,  a  user  may  write  a  program  in  the  AMGS 

HLL  and  then  use  the  code  on  different  target  machines. 
The  major  problem  is  making  the  transition  from  the  machine 
independent  intermediate  language  to  the  target  machine's 
microcode.  To  achieve  this,  each  machine  requires  a 

separate  code  generator  to  translate  the  intermediate 
language  to  the  microcode  level  plus  a  compactor  to  compact 
the  resulting  microcode.  This  is  not  a  trivial  step  and 


there  is  currently  consider able  rsssarch  being  conducted  on 
aicrosrchitecturs  description  techniques  that  will  assist 
in  asking  this  step  easier.  Geiser  has  introduced  a 
description  eethodology  that  covers  four  basic  areas: 

1)  Microinstruction  description:  includes  the  forest  of 
the  microinstruction,  fields  used  in  the 
microinstruction,  and  possible  values  in  each  field. 

2)  Eleeent  descriptions:  describes  and  naees  el seen ts 
of  the  aachine  hardware  including  memory,  registers,  etc. 

3)  Microoperation  usage  rules:  a  set  of  rules  for 
constructing  valid  aicrooperations. 

4)  Microengine  behavioral  rules:  specifies  interactions 
between  the  aicrooperations.  CRef.  6:  pp.  517  -  5213 

By  using  this  technique  it  is  possible  to  describe  the 
target  machine  in  a  standardized  foreat  so  that  the  writing 
of  the  aachine  dependent  code  generator  is  auch  easier. 

Of  course  the  description  eethodology  does  not 
eliminate  the  problem.  The  main  purpose  of  a  description 

eethodology  is  to  reduce  the  work  required  to  port  the 

/ 

language  to  another  machine  by  maximizing  the  common 
features  of  the  different  machine  dependent  languages. 
This  may  eliminate  desirable  machine  dependent  features  but 
it  does  permit  a  'nearly  machine  independent'  language. 


The  assumption  is  if  you  cannot  be  totally  independent  then 
be  as  independent  as  possible.  CRef.  5:  p.  1953 


True  Machine  independence  has  not  been  achieved  in  this 
AMGS  and  probably  will  not  be  achieved  in  the  near  future, 
however  the  eicroarchitecture  description  methodology  is  an 
attempt  at  reducing  the  portability  problem.  By  providing 
a  systematic  description  of  microarchitectures,  the 
description  methodology  reduces  the  amount  of  work  required 
to  move  a  system  to  another  comparable  machine.  The  AMGS 
is  providing  a  step  toward  an  ultimate  goal  of  machine 
independence  that  may  never  be  achieved.  However,  the  AMGS 
has  helped  to  define  and  simplify  some  of  the  steps 
involved  in  making  microcode  generation  less  machine 
dependent. 

C.  COMPACTION  AND  OPTIMIZATION 

Before  reviewing  the  current  compaction  techniques  it 
is  important  to  understand  the  difference  between 
compaction  and  optimization.  Microcode  compaction  will 

reduce  the  space  required  to  store  a  program  but  does  not 
guarantee  a  reduction  in  the  speed  of  execution. 

Optimization,  on  the  other  hand,  results  in  a  reduction  in 
execution  speed  but  does  not  guarantee  that  any  code 
compaction  will  occur.  Sometimes  execution  time  will 
decrease  when  the  code  is  compacted,  but  the  reduction  of 
execution  time  is  not  guaranteed,  in  fact  execution  time 
can  in  some  cases  increase.  The  only  conclusion  that  can 
be  drawn  is  that  successful  compaction  guarantees  fewer 


total  instructions  and  say  lsad  to  faster  or  possibly 
si oner  ax scut ion  spood. 

Most  HLL  cospilsrs  do  includs  an  optimization  step, 
however,  ths  prassnt  version  of  ths  JRS  HLL  compiler  does 
not.  There  are  two  reasons  for  this.  One  reason  is  that 
excessive  optimizations  prior  to  microcode  generation  can 
make  error  correction  very  difficult  because  of  the 
movement  of  the  microoperations.  Secondly,  since  this  was 
the  first  production  version  of  an  AMOS,  some  of  the  more 
difficult  problems  were  not  handled.  Optimizing  the 
compiler  without  excessively  affecting  error  correction  is 
one  of  the  more  difficult  problems.  The  AMBS  does  as  a 
whole  include  a  number  of  optimization  steps  designed  to 
produce  more  efficient  microprograms.  An  example  of  such  a 
step  is  the  use  of  registers  to  hold  array  offset  addresses 
to  help  reduce  memory  fetch  delay.  Even  though  none  of  the 
common  compiler  optimization  techniques  are  used  in  this 
system,  it  is  important  to  discuss  them  here  to  understand 
the  effect  they  could  have  on  microcode  compaction. 

Bries  gives  a  good  explanation  of  the  four  main 
compiler  optimization  techniques  that  are  applicable  to 
almost  any  algebraic  programming  language  such  as  Fortran, 
Pascal,  Algol,  PL/I,  etc.  The  four  methods  are: 

1)  Folding:  for  any  operator  whose  operands  are  known  at 
compile  time,  perform  the  applicable  operation  at  compile 
time  rather  than  at  execution  time. 


2)  Eliminating  redundant  operations:  mainly  -factoring  out 
common  subexpressions. 

3)  Moving  operations  out  o-f  loops  i-f  their  operands  do 
not  change  within  the  loop. 

4)  Reducing  the  number  o-f  multiplications  in  loops: 
e-f-fecti vely  changing  the  multiplications  to  additions. 
CRe-f.  7:  pp.  376  -  3771 

A  system  may  use  these  techniques  to  attain  whatever  level 
o-f  optimization  is  desired,  however  there  is  a  tradeoff 
between  the  level  of  optimization  and  the  time  required  to 
perform  the  compilation.  Also  as  mentioned  above, 
extensive  optimization  will  result  in  radically  altering 
the  sequence  of  operations  and  therefore  make  debugging 
very  difficult.  CRef.  7:  p.  3763 

Even  though  optimization  is  important,  there  has  been 
very  little  work  done  on  optimization  of  microcode.  Almost 
all  of  the  work  done  on  microcode  has  been  in  the  field  of 
compaction  because  optimization  of  microcode  is  very 
difficult  to  do  systematically  and  is  not  well  understood. 
Most  microcode  compaction  research  has  been  justified  by 
the  assumption  that  execution  time  will  decrease  when  the 
code  is  compacted.  It  is  important  to  keep  this  assumption 
in  mind  when  discussing  compaction  because  the  results  of 
the  compaction  are  not  guaranteed  to  reduce  execution  time 
and  will  certainly  not  optimally  reduce  execution  time. 
However,  compaction  is  the  only  automated  method  for 


improving  Microcode  that  is  currantly  aval lab Is  for 
practical  usa. 


It  is  isportant  to  rssssbsr  ths  assumption  that  tha 
targat  aachina  Mill  ba  horizontally  microprogrammable, 
aaaning  that  aora  than  ana  oparation  aay  ba  axacutad  daring 
any  microinstruction.  If  tha  targat  aachina  is  not 
horizontally  microprogammable,  than  only  ona  microoperation 
aay  occur  during  any  microinstruction  (or  aachina  cycle) 
and  compaction  is  not  possibla.  There  are  two  classes  of 
microcode  compaction  for  horizontally  microprogrammable 
computers,  local  and  global,  and  a  discussion  of  the 
compaction  techniques  from  both  classes  will  follow.  JRS 
does  not  do  any  coda  compaction  in  this  version  of  the 
AMBS.  However,  by  reviewing  tha  many  methods  of  compaction 
available  it  will  ba  evident  which  methods  are  tha  most 
promising  for  future  improvements. 

Local  compaction  of  microcode  is  concerned  with  the 
reduction  of  the  number  of  microinstructions  in  a 
straight-line  code  (SLC)  segment  of  a  microprogram.  An  SLC 
segment  is  any  sequence  of  microinstructions  that  begins 
either  at  the  start  of  tha  program  or  after  a  branch 
statement  and  ends  either  at  tha  and  of  tha  program  or  at  a 
branch  statement.  Only  one  entrance  and  one  exit  is 
allowed  in  any  SLC  segment.  Local  compaction  is  simply  an 
attempt  at  reducing  the  number  of  microinstructions  in  each 
SLC  segment  by  combining  instructions  or  eliminating 


duplicated  instructions.  Ths  most  promising  and  papular 
varsions  art  first -coos  first  serve,  critical  path,  branch 
and  bound,  and  list  scheduling. 

First— com  first-serve  is  probably  the  siaplest  fore  of 
local  coepaction  possible.  Each  eicrooperation  is 
considered  only  once,  in  source  code  order,  and  in  the  SLC 
segeent  that  it  exists.  Each  eicrooperation  is  Moved  as 
far  forward  in  its  segeent  as  possible.  If  it  can  be 
coebined  with  a  previous  operation  without  causing  a 
conflict,  then  it  will  be  coebined.  Once  a  eicrooperation 
has  been  checked  and  coebined  or  not  coebined,  it  will 
never  be  considered  again.  This  results  in  fast  coepaction 
but  the  resulting  eicrocode  is  not  optieally  coepacted. 
CRef.  8:  p.  4181 

Critical  path  algorithms  coepact  eicrocode  in  each  SLC 
segeent  by  identifying  eicrooperations  "that  cannot  be 
delayed  without  increasing  the  nueber  of  eicroinstructions 
needed  for  the  ei cr oprogr ae. "  CRef.  8:  p.  4153  This  is 
accomplished  by  first  identifying  the  longest  paths  in  the 
data  dependency  graph.  Each  of  the  longest  paths  is  called 
a  critical  path  and  shortening  the  path  will  result  in  a 
more  compact  program.  Each  microoperation  in  each  critical 
path  is  checked  to  see  it  if  can  be  moved  forward  and 
combined  with  another  eicrooperation.  If  it  can  be  moved 
forward,  the  critical  path  will  be  shortened  and  the  result 
is  a  more  compact  program.  If  any  microoperation  in  any  of 


the  critical  paths  is  dslaysd  (not  forwarded  as  such  as 
possible),  than  tha  trailing  aicrooparations  will  ba 
dalayad,  which  will  result  in  aore  aicroi instructions  than 
are  actually  naadad  and  less  coapact  aicrocode.  CRef.  8:  p. 
4223  Once  again  tha  results  are  not  optimal  and  tha  time 
required  to  do  tha  compaction  is  a  polynomial  function  of 
tha  number  of  aicrooparations  which  are  considered  in  each 
SLC  segment. 

Branch  and  bound  algorithms  can  guarantee  optimality  in 
storage  space  required  for  the  microprogram.  Remember  that 
this  says  nothing  about  the  execution  time  of  the  program. 
The  method  depends  upon  searching  a  tree  structure 

exhaustively,  looking  for  the  optimal  ordering.  This 
method  may  produce  optimal  compaction,  but  the  time 

required  is  an  exponential  function  of  the  number  of 

microoperations  in  the  microprogram,  making  the  method  very 
expensive.  There  are  variations  to  the  branch  and  bound 
algorithms  that  are  not  so  expensive.  One  such  variation 
involves  pruning  the  tree  structure  prior  to  searching  the 
tree.  This  pruning  reduces  the  cost  of  the  algorithm  to  a 
polynomial  function  of  the  number  of  input  microoperations. 
However,  the  reduction  in  cost  also  produces  less  than 
optimal  microcode.  CRef.  8:  p.  4243 

List  scheduling  searches  through  each  SLC  segment  and 
attempts  to  schedule  each  microoperation  at  the  earliest 

possible  point  within  the  window  of  code  that  is  being 


considwrid.  The  size  of  tha  window  is  variabls  but  ths 
lsrgsr  ths  window  ths  longer  ths  tiss  rsquirsd  to  do  ths 
job.  Also*  ss  ths  window  sizs  is  increased,  thsrs  is  s 
disinishsd  rsturn  (dininishsd  asount  of  cods  cospaction) 
for  sach  unit  incrsass  in  window  sizs  bscauss  o-f  ths 
incrsassd  chancs  of  finding  a  data  dspsndsncy.  Ths  furthsr 
away  ths  cospaction  is  attempted,  ths  grsatsr  ths  chancs  of 
two  data  itsss  nssding  ths  sass  register,  or  sons  othsr 
data  dspsndsncy.  List  schsduling  is  not  optimal,  but  ths 
cost  is  as  low  as  first -corns  first-ssrvs  and  ths  rssults 
ars  bsttsr  than  first-corns  first-ssrvs. 

Of  thoss  four  local  mat hods,  only  list  schsduling  and 
first-coma  first-ssrvs  can  ba  dona  in  what  is  considsrsd  a 
’reasonable'  amount  of  tims  and  producs  accsptabla  results. 
The  fact  that  list  scheduling  produces  batter  rssults  than 
first -coma  first-ssrvs  in  general  was  shown  in  a  study  dons 
by  Davidson,  st  al .  CRsf.  8:  p.  4601  This  would  justify 
ths  use  of  list  schsduling  as  ths  compaction  method  for  the 
AMGS  if  only  local  compaction  methods  were  available, 
however  there  are  global  compaction  techniques  that  should 
be  considered.  It  is  an  intuitive  notion  that  global 
compaction  techniques  should  provide  better  compaction 
since  they  look  at  ths  entire  program  and  not  only  at  small 
SLC  segments. 

It  is  true  that,  in  general,  global  compaction 
techniques  provide  bsttsr  compaction  than  local  compaction 


tachniquM  yet,  in  comparison  to  local  compact i on 

v-  hni quM,  global  compaction  tochniquos  ara  vary 

axpansiva.  Traca  scheduling,  traa  compaction,  and 
compaction  baamd  on  a  gonaralizad  data  dapandancy  graph 
(SDD6)  ara  tha  thraa  most  promising  global  compaction 
tmchniqums.  Trace  scheduling  identifies  tha  most 

frequmntly  traversed  path  through  a  section  of  microcode 

and  does  a  local  compaction  on  that  path.  Tha  process  is 
rapaatad  on  all  of  tha  paths  through  tha  microprogram  until 
no  further  microoperation  movement  is  possible.  A  data 
dapandancy  graph  must  be  constructed  for  each  path  analyzed 
and  any  microoperations  that  ara  moved  must  be  documented. 
This  documentation  is  dona  to  insure  that  tha  moving  of 
microoperations  mill  have  no  affect  on  other  loops.  The 

bookkeeping  for  trace  scheduling  is  the  most  expensive 
part.  In  fact  in  the  worst  case,  the  memory  required  to 
run  a  trace  scheduling  compactor  can  grow  exponentially. 
CRef.  9:  p.  4801  Therefore,  although  trace  scheduling  does 
an  excellent  job  of  microcode  compaction,  the  overhead  is 
too  high  to  justify  its  use. 

Tree  compaction  is  based  on  trace  scheduling.  The 
advantage  of  tree  compaction  over  trace  scheduling  is  the 
control  of  the  increase  in  memory  size.  Tree  compaction 
divides  the  microprogram  into  subsets  and  applies  the  trace 
scheduling  techniques  to  the  subsets  individually.  This 
achieves  compaction  that  is  close  to  the  results  achieved 
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by  trace  scheduling  yst  is  not  nssrly  as  expansive.  This 
eethod  eay  be  useful  when  it  is  fully  researched  and 
understood,  however  tree  coepaction  still  produces 
eicrocode  that  is  less  than  opt i sue  and  the  cost  can  be 
high. 

The  third  global  coepaction  eethod  is  based  on  a  global 
data  dependency  graph  (6DD6) .  A  GDDG  "is  capable  of 
representing  in  a  single  chart  the  data  dependency  of 
eicroorders  not  only  within  a  basic  block  but  in  different 
basic  blocks."  CRef.  10:  p.  9243  Both  trace  scheduling  and 
tree  coepaction  use  a  data  dependency  graph  (DDG)  to 
represent  the  data  dependency  of  eicroorders  in  the  basic 
blocks,  however  a  DOG  is  not  capable  of  representing  the 
data  dependencies  beyond  the  basic  block.  This  is  the  eost 
ieportant  aspect  of  global  coepaction;  moving  eicroorders 
to  adjacent  blocks  when  possible. 

Through  use  of  the  GDDG,  it  is  possible  to  identify 
microoperations  that  ’sust’  be  in  a  basic  block  and  those 
that  *eay’  be  in  a  basic  block.  Then,  by  identifying  the 
frequency  of  execution  of  the  separate  blocks  it  is 
possible  to  make  intelligent  choices  about  moving 
microoperations  from  block  to  block  or  within  the  same 
block.  The  algorithm  costs  an  amount  which  "is  practica  lly 
0(n),  where  n  is  the  number  of  microorders  contained  in  a 
source  microprogram. "  CRef.  10:  p.  9303  This  is  a  very  low 
cost  and  the  preliminary  results  show  that  the  algorithm 


provides  compaction  that  is  Mithin  thrss  to  five  parcant  of 
optiaua  (handwritten)  aicrocoda. 

Of  tha  thraa  global  compaction  methods  described,  only 
the  method  based  on  an  BDD8  is  efficient  and  results  in  low 
costs.  Why  then  did  JRS  not  use  this  compaction  method  in 
the  AMOS?  The  answer  is  that  during  development  of  the 
AM8S,  this  compaction  method  was  not  available.  JRS  is 
currently  revising  the  system  to  incorporate  the  BDDG 
global  compaction  technique,  which  should  result  in  a  much 
more  efficient  system  than  was  evaluated  in  this  study. 

By  looking  at  the  two  main  compaction  methods,  global 
and  local,  it  is  evident  that  global  compaction  holds  the 
most  promise  far  efficiency  that  approaches  the  optimum. 
Once  global  compaction  methods  are  more  thoroughly 


researched 

and 

developed. 

they  will 

become 

the  logical 

choice  if 

the 

cost  can  be 

controlled. 

Blobal 

methods  are 

the  only  methods  that  approximate  the  handcoded  versions. 
Local  compaction  does  provide  some  compaction  but  does  not 
in  general  do  as  well  as  handwritten  microcode. 

0.  AMGS  LIMITATIONS 

The  AMOS  developed  by  JRS  is  designed  to  allow  a  small 
CPU  intensive  algorithm  to  be  compiled  in  microcode  and 
placed  in  the  WCS  of  the  VAX  11/780.  When  the  algorithm  is 
needed  it  can  be  called  from  a  Fortran  program.  There  are 
several  limitations  of  the  system  that  are  important  to 
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r— — br  Nhan  consi daring  what  applications  My  be  uud  on 
this,  systaa.  Individually  ths  li  si  tat  ions  My  ssm  smII 
and  svsn  unimportant,  however,  tha  coabinad  effect  of  tha 
li  ait  at  ions  My  sliainata  soaa  of  tha  applications. 

First,  tha  WCS  only  has  IK  words  of  aaaory  for  tha 
aicrocoda.  Sinca  tha  aicrocoda  aust  ba  loadad  into  tha  WCS 
bafora  axacution  dua  to  linkaga  raquiraaants,  paging  of  tha 
algoritha  into  tha  WCS  during  axacution  is  not  consi dar ad 
an  option.  Tharafora  tha  usar  is  liaitad  to  an  algorithm 
or  coll act ion  of  algorithms  that  is  no  largar  than  769 
microinstructions  bacausa  tha  othar  2S5  instructions  are 
usad  for  pradafinad  functions.  In  fact,  of  tha  769 
microwords  of  aaaory  availabla,  about  30  instructions  arm 
already  taken  up  by  function  entry  and  exit  code  that  is 
required  for  register  initialization  and  can  not  be 
modified  by  the  user.  Tha  exact  number  of  instructions 
varies  depending  upon  the  microprogram  being  executed. 

Compacting  a  long  algorithm  to  fit  into  the  limited 
space  of  the  WCS  My  be  difficult  or  even  impossible.  Once 
the  user  has  determined  that  the  algorithm  will  fit  in  the 
WCS,  then  he/she  must  determine  the  'hot'  spots  of  the 
program  (portions  of  the  algorithm  that  use  the  most  CPU 
tiM),  separate  those  parts  of  the  program  from  the  rest, 
code  those  parts  in  the  JRS  HLL,  and  set  up  the  microcode 
procedure  call.  This  My  be  only  a  minor  inconvenience 
but,  it  is  extra  effort  needed  to  use  the  AMGS. 
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Second,  JRS  claims  that  tha  AMGS  coda  Mill  do  integer 
arithaetic  and  coaparisons  very  quickly,  but  any  problea 
involving  priaarily  -floating  point  arithaetic  Mill  achieve 
minimal,  if  any  increase  in  perforaance.  This  is  because 
the  JRS  HLL  uses  the  saae  floating  point  acceleration 
routines  as  the  Fortran  prograa.  Portions  of  the  floating 
point  algorithm  that  do  not  use  the  floating  point 
accelerator  may  execute  faster  Mhen  executed  on  the  AMGS, 
but  the  net  gain  Mill  probably  not  be  very  large  due  to  the 
overhead  of  the  floating  point  accelerator.  During  the 
testing  of  the  AMGS  the  truth  of  this  claim  by  JRS  Mill  be 
documented  since  there  Mill  be  several  tests  to  check  the 
floating  point  accelerator  performance. 

The  JRS  HLL  is  set  up  to  support  only  integer  and 
floating  point  data  structures.  No  character  data 
structure  is  available  and  therefore  applications  using 
characters  are  not  considered  feasible.  Arrays  of  integers 
and  floating  point  numbers  are  possible  but  the  lack  of  a 
character  data  structure  Mill  limit  some  applications  or  at 
least  make  them  very  difficult  to  do. 

If  the  algorithm  includes  I/O  then  the  algorithm  must 
be  reMritten  to  eliminate  the  I/O  from  the  portion  of  the 
algorithm  to  be  coded  in  JRS  HLL  since  the  HLL  does  not 
include  any  I/O  statements.  The  I/O  can  normally  be  moved 
into  the  Fortran  program  that  Mill  call  the  WCS  program. 
Besides  providing  an  I/O  function,  the  Fortran  program  Mill 
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Mt  up  any  data  structures  nssdsd  for  the  progras.  This  is 
rsally  no  sors  than  a  Minor  inconvenience,  but  it  does 
coaplicate  the  use  of  the  systeo. 

Several  other  restrictions  are  listed  in  the  AMGS 
eanual  and  repeated  below. 

1)  Contained  eaxieue  of  fourteen  arrays  and  compi 1 er 
teeporary  variables. 

2)  Maxi mum  of  twenty  DO— 1 oops  nested  at  any  one  tine. 

3)  Maxi nun  of  five  hundred  synbols  nay  be  defined  in  a 
progran. 

These  restrictions  will  not,  in  general,  elininate 
applications  but  they  are  restrictions  based  on  the 
inplenentation  of  the  systea  on  the  VAX  11/780.  These 
restrictions  are  inportant  because  they  point  out  some  of 
the  nachine  dependencies  that  exist  even  when  an  attempt  is 
made  to  remain  machine  independent. 

The  final  limitation  of  the  JRS  AMGS  is  a  simple 
observation.  One  of  the  main  motivations  for  having  an 
AMGS  is  to  allow  for  portability  of  the  microcode. 
Presently,  this  system  is  only  implemented  on  the  VAX 
11/780.  Therefore,  a  current,  yet  hopefully  temporary 
limitation  is  that  the  AMGS  has  not  been  programmed  to 
generate  microcode  for  any  other  machines.  This  limitation 
will  result  in  eliminating  many  of  the  advantages  of  the 
AMGS  if  it  is  not  corrected. 
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Assuming  the  application  algorithm  can  ba  cod ad  around 
these  limitations,  tha  usar  should  ba  able  to  achieve 
better  throughput  by  using  tha  AMGS.  A  goal  o-f  this 
project  is  to  aaka  it  easier  -for  a  user  to  determine  if  a 
potential  application  will  benefit  from  the  use  of  the 
AMGS. 

E.  PERFORMANCE  EVALUATION  METHODOLOGY 

The  performance  evaluation  was  conducted  to  determine 
the  throughput  possible  using  the  AMGS.  There  are  many 
techniques  available  for  doing  performance  evaluations 
including  hand  tiwing,  formula  methods,  instruction  mixes, 
and  benchmarks,  each  having  individual  advantages  and 
disadvantages.  The  method  used  for  this  evaluation  must  be 
capable  of  comparing  two  different  programs  and  of  giving 
accurate  results.  Therefore  a  collection,  or  benchmark  of 
programs  was  defined  with  each  program  representing  a 
different  possible  application  for  the  AMGS. 

This  kernel  of  programs  was  carefully  developed  to 
contain  the  characteristics  of  the  many  possible  algorithms 
which  might  be  run  on  the  system.  This  is  a  very  important 
step  for  the  validation  of  the  results.  If  the  proper 
program  characteristics  are  not  tested,  the  results  will  be 
invalid.  By  categorizing  the  algorithms  according  to 
application  it  is  possible  to  specify  what  applications 
will  benefit  by  use  of  the  AMGS. 


After  defining  a  kernel  of  programs  and  coding  than  in 
both  Fortran  and  tha  JRS  HLL,  tha  prograas  wara  run  and  tha 
raaulta  coaparad.  Basidaa  coaparing  axacution  time,  othar 
-factora  praviously  diacuaaad  in  this  chaptar  wara 
considarad.  Ease  of  programming,  system  reliability,  and 
the  compatibility  of  the  application  problem  with  tha  A MGS 
were  also  considered. 

One  important  question  is  how  much  better  a  manual 
microprogrammer  could  do.  The  purpose  of  using  the  AMGS  is 
to  achieve  increased  throughput  without  using  a  large 
amount  of  pr ogr ammi ng  time  as  would  be  required  with  the 
manual  method.  Even  though  manual  microprogramming  is 
costly  due  to  development  time,  it  is  considered  the 
standard  and  the  results  of  the  performance  evaluation 
should  be  compared  against  the  standard.  By  comparing  all 
three  execution  times,  Fortran,  JRS  microcode,  and  hand 
written  (actually  hand  compacted)  microcode,  it  will  be 
possible  to  identify  the  best  applications  and  possibly 
determine  methods  for  making  the  slower  applications 
f aster . 


F.  BACKGROUND  SYNOPSIS 


Since 

the 

main 

factors 

affecting  the  AMGS 

have  b< 

ten 

reviewed. 

the 

next 

step  is 

to  determine  the 

kernel 

of 

programs 

to 

be 

tested. 

These  programs 

must 

be 

representati ve  of  the  applications  that  might  be -used  on 


the  AH68.  The  purpose  of  defining  this  kernel  is  to  attain 
general  results  that  Mill  give  an  AM8S  user  an  idea  as  to 
the  effectiveness  of  a  specific  application.  The  next 
chapter  Mill  discuss  the  applications  to  be  tested  and  the 
programs  used  to  test  those  applications. 
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tftTION 

There  are  many  limitations  that  must  b«  considered  when 
choosing  the  proper  benchmark  -for  a  system.  The  benchmark 
must  take  into  consideration  the  AMGS  limitations 
enumerated  in  the  previous  section  and  insure  that  the 
results  are  not  biased  by  those  limitations.  Limitations 
such  as  the  WCS  size  and  the  existence  o-f  only  integer  and 


real  data 

structures 

have  a  major 

effect  on  the 

applications 

possible  when  using  the  AMGS.  With  these 

1 imitations 

in 

mind. 

it  is  possible 

to  def i ne  some 

applications 

that 

can 

use  the  AMGS.  One 

common  computer 

application 

that 

will 

definitely  not 

have  increased 

throughput  due  to  AMGS  use  is  I/O  intensive  applications. 
The  HLL  was  designed  without  I/O  capability  because 
microcode  implementations  do  not  increase  the  throughput 
for  I/O  intensive  applications.  However  there  are  several 
applications  for  which  the  AMGS  should  theoretically 
provide  increased  throughput. 

The  applications  tested  in  this  study  are  grouped  into 
four  basic  areas.  These  areas  are: 

1)  Integer  mathematics 

2)  Floating  point  mathematics 

3)  Sorting  and  Searching  (Comparisons) 

4)  Bit  manipulations 
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Thar*  art  several  subcategories  in  tha  four  basic  araas.  A 
discussion  of  tha  subcatagor i as  f ol low*. 

Mathaaatically  intansiva  applications  that  do 
calculations  Mithin  tha  liaits  of  tha  AMOS  ara  priaa 
candidates  for  tha  systea.  Th ara  ara  savaral  different 
types  of  aathaaatical  calculations  that  should  ba 
considered.  Integer  arithmetic  must  b*  considered 
separately  fro*  floating  point  arithmetic  dua  to  tha 
different  methods  usad  for  doing  tha  calculations.  Integer 
addition/subtraction  is  handled  internally  by  th*  AMGS,  but 
tha  floating  point  accelerator  (FPA)  on  the  VAX  computer  is 
used  for  floating  point  calculations  and  integer 
multiplications.  This  call  by  the  AMOS  to  the  FPA  results 
in  a  significant  amount  of  overhead  for  each  call.  When  a 
Fortran  program  calls  the  FPA  there  is  also  some  overhead, 

» 

but  since  Fortran  translates  to  machine  code  and  machine 
code  calls  to  the  FPA  involve  less  overhead  than  AMOS 
calls,  the  net  result  is  si oxer  execution  time  for  th*  AMGS 
code  than  for  Fortran  code  during  floating  point 

operations.  This  extra  overhead  in  the  AMOS  is  due  to  a 
requirement  to  save  the  state  of  the  microprogram  prior  to 
executing  the  floating  point  operation.  Th*  result  is  a 
net  loss  of  throughput  when  doing  floating  point 

calculations  or  integer  multiplication  on  th*  AMOS. 

Several  types  of  calculations  are  possible  when  doing 
integer  and  floating  point  calculations.  Division, 
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Multiplication,  addition,  and  subtraction  ars  di-f-farsnt 

arithastic  oporations  and  the  incrsass  in  throughput  aay  bs 
different  -for  sach  typs  of  calculation.  As  such  as 
possible,  this  project  Mill  categorize  the  different 

calculations  and  shoe  the  percentage  of  increase  possible 
for  each  category,  however ,  in  the  interest  of  reducing  the 
total  nueber  of  tests  we  will  combine  tests  that  are  very 
similar.  Since  addition  and  subtraction  take  the  same 

amount  of  time  in  mi croprogr ammed  processors,  they  will  be 
tested  together.  Multiplication  and  division  are  not 
implemented  similarly  and  will  not  be  tested  together.  In 
fact,  since  division  can  usually  be  implemented  as 
reciprocal  multiplication,  division  will  not  be  tested. 
Integer  exponentiation  is  normally  accomplished  by  a  series 
of  multiplications  and  therefore  will  be  considered  a  part 
of  the  multiplication  test. 

Another  major  application  of  the  AMGS  is  sorting  and 
searching.  Since  sorting  and  searching  both  include 
comparisons  of  bit  patterns,  they  may  be  considered 
together  in  one  broad  category.  The  major  difference  is 
that  sorting  usually  includes  moving  of  data  or  moving  the 
painters  to  the  data,  while  searching  simply  involves 
comparing  until  the  desired  data  is  found. 

One  final  category  that  is  directly  applicable  to  the 
NRL  problem  is  bit  manipulation.  This  category  includes 
the  comparing,  shifting,  and  replacement  of  bits  or  fields 


of  bits  Mi thin  a  word.  This  catagory  say  bs  an  sxcsllsnt 
application  of  ths  AM6S  dus  to  tho  bit  Manipulating 


commands  that  ara  built  into  tha  JRS  HLL  such  as  tha  shift, 
swap,  and  mask  functions.  Fortran  has  tha  ability  to  do 
tha  bit  Manipulations,  but  tha  functions  ara  providad 
through  library  calls  which  tand  to  ba  slowar  than  diract 
languaga  i mp 1 aoant at i on  constructs. 

Tha  next  sac t ion  of  this  chapter  is  an  explanation  of 
each  test  and  tha  basic  araa  it  is  designed  to  test.  The 
explanation  of  the  results  of  each  test  is  included  in 
Chapter  Five.  Table  3-1  lists  tha  four  basic  areas  and  tha 
tests  that  cover  each  araa. 


Table  3-1:  Specific  Tests 


Intaoor  filth 


Floating  Paint- Hath 


1.  Do  Loop  1.  Chebyshev  Cosine 

2.  While  Loop  2.  Fast  Fourier  Transform 

3.  Summation 

4.  Factorial 


1. 

2. 

3. 

4. 


ng/Sttrching 


Bubble  Sort 
Sieve  of  Eratostheni 
Quicker  Sort 
Binary  Search 


Bit  Manipulations 

1.  Bit  Manipulation 

2.  Bit  Reversal 


The  simplest  test  was  designed  using  the  loop 
structures.  The  WHILE  loop  and  the  DO  loop  provide  a 
method  for  testing  addition  or  multiplication  and  comparing 
the  results  directly  with  the  Fortran  equivalent.  The 


simplest  test  is  s  WHILE  loop  that  only  increments  ths 
loop  counter.  This  tost  can  bs  dons  as  many  times  as  tha 
user  dssirss  and  it  also  can  bs  nsstsd  to  any  desirsd  lsvsl 
to  tost  the  effect  o-f  nesting.  The  basic  area  being 
checked  in  this  test  is  the  addition  and  comparison 
required  each  time  a  loop  is  completed.  This  comparison  is 
required  to  determine  the  test  condition  for  exiting  the 
loop.  A  DO  loop  is  another  version  of  the  loop  construct, 
with  the  increment  being  automatic  and  the  condition  test  a 
part  of  the  DO  statement.  By  using  these  two  tests  it  is 
possible  to  document  how  much  time  is  required  to  execute 
the  overhead  steps  in  any  loop.  This  overhead  cost  will  be 
used  to  analyze  programs  with  loops. 

The  next  two  tests  use  the  basic  loop  structure  to 
determine  the  summation  of  an  integer  or  the  factorial  of 
an  integer.  Each  of  these  tests  can  then  be  used  with  the 
results  from  the  previous  test  to  determine  the  amount  of 
time  required  to  do  either  an  integer  multiplication  or  an 
integer  addition  by  simply  subtracting  out  the  loop 
overhead. 

Floating  point  multiplication  is  the  subject  of  the 
next  test.  By  implementing  a  Chebyshev  approximation  for 
the  cosine  of  an  angle  and  calculating  many  values,  it  is 
possible  to  determine  the  amount  of  time  spent  doing 
floating  point  multiplication  for  each  system.  There  are 
some  floating  point  additions  that  will  add  overhead  to  the 
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test*  but  the  effect  of  ths  additions  should  bs  minimal. 
This  tsst  particularly  rmvmals  ths  ovorhmad  of  calling  thm 
floating  point  accslmrator  from  ths  AMGS.  JRS 

documentation  states  that  sines  both  Fortran  and  the  AMGS 
use  the  same  FPA,  there  should  be  no  speed  gained  by  use  of 
the  AMGS.  If  the  overhead  of  calling  the  FPA  from  the 
microcode  is  too  high,  then  it  Mill  make  the  AMGS  slower 
than  the  Fortran.  This  is  an  important  experiment  since 
the  results  will  be  a  prime  factor  in  determining  if  the 
AMGS  should  be  used  for  floating  point  applications. 

There  were  three  tests  written  to  evaluate  the  ability 
of  the  system  to  do  comparisons.  The  first  is  a  sort 
algorithm  called  Quicker sort  written  by  R.  S.  Scowen.  This 
algorithm  works  by  continually  splitting  the  array  of 
values  to  be  sorted  into  parts  and  sorting  the  parts  using 
the  same  method.  The  second  algorithm  is  a  method  to 
determine  all  of  the  prime  numbers  between  two  values. 
This  problem,  called  the  Sieve  of  Eratosthenes,  uses 
additions,  comparisons,  and  assignment  statements  to 
determine  the  prime  numbers  in  a  specified  range.  This 
algorithm  will  give  an  insight  into  how  all  three  of  these 
items  interact  to  affect  the  throughput  of  the  AMGS.  The 
third  test  is  a  bubble  sort  that  sorts  an  array  of  integers 
into  ascending  order.  By  using  a  loop  construct, 

comparisons,  and  a  simple  assignment  statement,  this 

algorithm  is  an  excellent  example  of  a  well  structured 


module  that  docs  comparisons  and  uses  assignment 
statements. 

Another  test  algorithm  is  a  Fast  Fourier  Transform 
(FFT)  written  in  two  parts  because  the  entire  program  would 
not  fit  into  the  WCS.  One  part  is  a  bit  reversal  program 
that  simply  assigns  elements  of  an  array  to  different 
locations  in  the  array.  The  other  part  is  the  complex 
multiplication  plus  a  Chebyshev  cosine  and  sine  generation 
routine  for  use  in  the  FFT.  The  bit  reversal  is  an 
excellent  comparison  of  assignment  statements  between  the 
two  methods  and  therefore  goes  in  the  bit  manipulation 
category.  The  FFT  complex  multiplication  is  another 
floating  point  multiplication  and  addition  algorithm.  By 
using  the  results  of  these  two  algorithms,  we  gain  an 
example  of  a  long  algorithm  that  uses  the  entire  WCS  (the 
FFT)  plus  an  algorithm  that  is  only  concerned  with  moving 
values  around  in  memory  (the  bit  reversal ) . 

The  final  test  is  an  algorithm  to  do  bit  manipulations 
using  the  bit  manipulating  functions  provided  by  both  the 
JRS  HLL  and  the  Fortran  library.  The  algorithm  takes  an 
array  of  integers  and  performs  different  operations  on  the 
integers  such  as  AND,  OR,  EXCLUSIVE  OR,  etc.  These 
operations  were  chosen  directly  from  the  example  NRL  source 
code,  so  this  test  specifically  tests  the  NRL  application. 

With  the  test  programs  now  fully  defined,  the  next  step 
is  to  describe  the  test  runs  and  the  timing  mechanism  used 
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IV.  TESTINB 

Tha  tasting  of  tha  programs 
accurat a  tool  availabla  so  that  any  arror  in  tha  timing 
machanism  would  ba  miniaizad.  That  is  why  tha  timing 
machanism  and  its  accuracy  wars  so  important  to  tha  rasults 
of  this  study.  Onca  tha  accuracy  of  tha  timing  machanism 
was  determined,  tha  minimum  langth  of  tha  tast  was 
spacifiad  to  maka  tha  tast  langth  much  longer  than  tha 
possible  arror.  Besides  tha  testing  machanism,  there  are 
other  aspects  of  program  design  that  affect  the  execution 
time  of  the  resulting  object  code.  Since  this  is  primarily 
a  comparison  between  Fortran  and  tha  AMGS,  both  tha  factors 
affecting  tha  execution  time  of  compiled  Fortran  coda  and 
the  factors  affecting  the  AMGS  ware  identified  and 

considered  during  tha  programming  phase  of  tha  project. 
Tha  desire  was  to  maka  the  tests  as  equitable  as  possible 
in  tha  two  different  languages. 

A.  TIMING  MECHANISM 

Tha  VMS  system  library  provides  a  software  mechanism 
for  timing  blocks  of  coda.  There  are  no  hardware  monitors 
available  to  time  individual  programs  and  hand  timing  is 
very  inaccurate  in  a  virtual  memory  system.  The  only 
method  that  is  relatively  accurate,  easy  to  use,  and  can 
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account  -for  the  virtual  memory  mechanism  ia  tha  system 
library  timing  function.  Thera  are  two  ways  to  use  this 
library  function  and  both  methods  display  precision  to  the 
nearest  one-hundredth  of  a  second.  The  system  manual 
claims  that  actually  calling  the  system  library  timing 
function  is  more  accurate  than  using  the  SECNDS  Fortran 
language  feature  (which  uses  the  system  library  function). 
CRef.  11,  p.  C-301  Even  though  the  claim  of  better 
accuracy  is  not  substantiated  by  any  specific  numbers  in 
the  manual,  the  system  library  function  was  chosen  for 
these  tests. 

There  is  a  certain  amount  of  overhead  as  a  result  of 
each  library  call  and  since  this  overhead  cannot  be 
accurately  measured,  it  results  in  inaccuracy  which  must  be 
minimized.  To  time  a  segment  of  code  requires  two  calls  to 
the  library  routine  with  the  code  to  be  timed  sandwiched 
between  the  two  calls.  The  first  call  starts  the  timing 
and  the  second  call  records  the  time.  To  minimize  the 
impact  of  the  overhead  in  each  use  of  a  library  function, 
the  minimum  time  for  the  code  segment  execution  must  be 
much  longer  than  the  overhead.  For  this  study,  we 
determined  by  actually  testing  a  series  of  timing  calls 
that  the  upper  limit  of  the  overhead  for  each  library  call 
was  less  than  .005  seconds.  Therefore,  we  designed  the 
Fortran  version  (without  common  or  subroutine)  of  each  test 


to  last  a  minimum  of  two  seconds.  This  means  that  the 


overhead  -for  the  two  library  calls  in  that  version  is  less 


than  one-half  of  one  percent  of  the  test  length.  All  tests 


lasted  longer  than  one 


ond  except  for  one  test  (the 


binary  search  eicrocode  compacted  version)  and  therefore 


the  possible  error  due  to  the  ti 


is  less  than  one 


percent  except  in  the  one  test  that  is  shorter  than  one 


second. 


Because  some  of  the  algorithms  being  tested  can  be 


accomplished  very  quickly  (in  less  than  0.5  seconds)  it  is 


important  to  increase  the  execution  time.  This  was  done  by 


repeating  the  algorithm  several  times  ^o  insure  that  enough 


time  was  spent  in  the  algorithm  to  produce  accurate 


results.  To  accomplish  this*  the  input  data  can  not  be 


changed  during  the  program  iteration  and  all  iterative 


counters  must  be  reinitialized  on  each  iteration.  These 


extra  instructions  do  add  overhead  to  the  test  but  the 


overhead  is  the  same  in  each  version  of  the  test  and 


therefore  the  impact  was  considered  to  be  minimal. 


ten  the  timing  mechanism  is  invoked  it  produces  any  of 


five  different  values  that  are  useful  in  analyzing  the 


amount  of  time  spent  in  an  algorithm.  The  first  value 


available  is  the  elapsed  time  spent  in  the  system,  whether 


executing  or  waiting.  The  second  value  is  the  total  CPU 


time  that  the  algorithm  being  timed  was  executing.  This  is 


the  most  important  value  since  it  displays  the  actual  CPU 


time  the  program  required  to  execute.  Next  is  the  number 


of  buffered  I/O  requests  and  the  number  of  direct  I/O 
requests.  These  numbers  are  not  important  in  this  study 
since  no  I/O  is  being  done  during  the  timing  periods.  The 
last  available  value  is  the  total  number  of  page  faults 
occurring  during  the  timed  period.  This  number  is  valuable 
because  it  states  how  many  times  the  job  was  interrupted 

and  waited  for  a  new  page  of  memory  to  be  fetched.  The 

larger  this  value,  the  greater  the  chance  for  error  because 
the  clock  must  be  stopped  and  started  for  each  page  fault. 
The  fewer  page  faults  and  the  closer  the  elapsed  time  is  to 

the  CPU  time,  then  the  less  chance  of  inaccuracies  due  to 

timing  errors. 

B.  LANGUAGE  FEATURES  AND  THE  EFFECT  ON  TIMING 

Before  looking  at  the  effects  of  the  language  features 
it  is  important  to  note  that  if  a  programmer  does  ’dumb’ 
things,  almost  any  algorithm  can  be  programmed 
inefficiently  in  any  language.  It  is  a  basic  assumption 
during  these  tests  that  the  algorithms  are  not  being 
programmed  poorly  and  every  effort  is  made  to  use  good, 
solid  algorithms.  Also,  since  the  same  algorithm  is  being 
programmed  in  both  languages,  any  bad  programming  practices 
will  be  present  in  both  versions  and  therefore  tend  to 
cancel  each  other  out. 

The  next  consideration  is  the  effect  of  language 
features  on  execution  speed.  In  the  JRS  HLL,  there  are  no 


futuris 


that  Mill  affect  execution  time  axcapt  for  tha 
call  to  tha  FPA  whan  doing  floating  point  arithmatic. 
Floating  point  arithmatic  raquiras  a  saparata  algorithm 
bacausa  of  the  data  representation  required.  The  data  must 
be  represented  in  one  word  and  that  one  word  includes  both 
a  mantissa  and  an  exponent.  The  algorithm  must  separate 
the  mantissa  and  the  exponent,  perform  the  operation  after 
aligning  the  decimal  point,  and  then  store  the  mantissa  and 
exponent  back  into  the  single  word  of  memory.  Floating 

point  arithmetic  is  common  in  all  block  structured 

arithmetic  languages  and  therefore  Fortran  has  the  same 
problem,  but  not  to  the  same  extent  as  the  JRS  HLL. 

The  Fortran  language  is  not  as  simple  as  the  JRS  HLL 
and  therefore  some  of  the  Fortran  language  features  affect 
the  execution  speed  of  the  program.  Fortran  has  several 

different  data  access  and  parameter  passing  modes  that  do 
affect  the  execution  time  of  a  program.  It  is  important  to 
design  tests  that  show  the  effects  of  different  uses  of 
these  features  on  the  execution  time  of  the  same  algorithm. 
Otherwise,  the  results  of  the  tests  will  only  be  valid  for 
the  language  features  being  used  in  that  specific  test  and 
could  not  be  generalized  for  any  program  in  the  testing 
category. 

Some  of  the  language  features  of  Fortran  that  affect 
the  execution  time  are  common  blocks,  subroutines  with 

common  blocks,  and  subroutines  with  parameters.  Common 


-.v. 


SI 


blocks  affect  ths  sx  scut  ion  tins  of  a  program  bscauss  tha 
blocks  ars  placsd  in  a  spsci-fic  location  in  memory  which 
rssults  in  mors  indexing  and  slowsr  data  accsss  -for  each 
item.  If  instead  of  using  a  common  block  the  data  is 
simply  declared  in  memory,  there  will  be  a  shorter  access 
time  for  each  data  item  and  faster  execution. 

The  use  of  subroutines  adds  overhead  due  to  the  linkage 
conventions  and  activation  record  initialization  that  is 
required.  Each  time  a  subroutine  is  called,  the  current 
state  (registers  and  program  counter)  must  be  saved  in  an 
activation  record  to  insure  that  the  state  can  be 

reinitialized  when  the  subroutine  is  exited.  When  common 
blocks  are  combined  with  the  use  of  subroutines  there  is 
both  the  overhead  of  the  subroutine  call  and  the  overhead 
of  accessing  the  data  items  in  the  common  data  area.  These 
two  added  types  of  overhead  result  in  increased  execution 
time  when  compared  with  code  that  does  not  use  the 
features.  On  the  other  hand,  the  features  provide  methods 
of  passing  data  that  are  not  otherwise  available. 
Therefore  the  user  must  tradeoff  modularity  in  design  and 
ease  of  passing  data  between  subroutines  for  longer 
execution  times. 

The  use  of  subroutines  with  parameter  passing  results 
in  even  more  overhead  because  of  the  requirement  to  set  up 
the  data  area  for  the  parameters,  passing  the  parameter 
upon  subroutine  call,  and  returning  the  new  values  of  the 


pirtMtvri  upon  subroutine  termination.  Again,  this 
language  -Feature  adds  to  the  convenience  and  modularity  o-f 
the  program,  yet  results  in  longer  execution  time. 

It  should  be  noted  though  that  without  common  blocks  or 
parameter  passing  there  is  no  way  to  pass  data  between 
subroutines.  Also,  i-f  a  data  area  is  large,  parameter 
passing  may  be  very  costly,  even  to  the  point  o-F  being 
unusable.  Another  possibility  is  writing  the  program 
without  using  subroutines  or  data  passing  mechanisms. 
However  this  usually  results  in  programs  that  are  di-f-ficult 
and  expensive  to  read  and  maintain.  Since  this  is 
unacceptable  for  most  software  projects,  most  programs  are 
written  using  soma,  if  not  all  of  these  features. 

In  order  to  document  this  tradeoff,  all  programs  were 
tested  in  each  of  the  following  categories. 

1)  Fortran  without  use  of  a  common  block  or  subroutine. 

2)  Fortran  using  a  common  block  but  no  subroutine. 

3)  Fortran  using  a  common  block  and  a  subroutine. 

4)  JRS  HLL  using  a  common  block  and  a  subroutine. 

By  testing  each  program  using  each  of  these  methods,  we  can 
identify  the  amount  of  time  added  by  the  use  of  each 
language  feature.  The  user  can  then  weigh  the  use  of  the 
JRS  HLL  depending  upon  what  features  are  desired.  The  most 
realistic  comparison  is  between  a  Fortran  program  using 
subroutines  with  common  blocks  and  the  JRS  HLL  because  the 
JRS  HLL  requires  the  use  of  both  a  subroutine  call  and  a 


co— on  block.  Besides,  most  largo  Fortran  programs  arm 
written  using  subroutines  and  common  blocks  so  that  the 
resulting  program  is  modularized  yet  allows  for  easy  data 


access. 

One  other  requirement  was  determined  during  the  testing 
due  to  the  VMS  operating  system  being  a  virtual  memory 
system.  When  preliminary  tests  were  made  it  was  determined 
that  other  users  seemed  to  have  an  effect  on  the  execution 
time  of  the  tests.  Therefore,  the  tests  were  made  under 
two  different  conditions.  One  condition  was  with  the 
system  clear  of  any  other  users.  The  other  condition  was 
with  other  users  on  the  system.  This  was  done  to  be  able 
to  document  the  difference,  if  any  difference  existed,  and 
clear  up  the  question  about  the  effect  of  other  users  on 
the  timing  of  a  program.  Chapter  Five  has  a  further 
explanation  of  the  timing  mechanism  accuracy  analysis. 

The  main  emphasis  during  the  writing  of  the  tests  was 
on  making  the  programs  equivalent.  All  versions  of  each 
algorithm  must  do  each  step  the  same  way  so  that  the 


comparison  is  fair 

yet 

the  tests 

must  be  programmed 

as 

a 

’ normal ’  programmer 

would 

do  it 

in  that  language. 

If 

a 

program  is  written 

in  a 

special 

way  that  is  known 

to 

be 

optimal  for  one  of  the  languages  then  the  comparison  of 
execution  times  would  favor  that  language.  However,  if  it 
is  natural  to  use  the  feature  in  that  language  then  that 
was  the  way  it  was  done.  One  example  of  this  policy  is  in 


tasting  ths  cosins  function 


Sines  ths  VAX  11/780  VM8 


oparating  systas  providss  a  cosins  library  function,  ths 
library  cosins  function  was  cosparsd  to  ths  Chsbyshsv 
approx i sat ion  to  sss  which  ssthod  is  faster.  Thus  both 
asthods  (Chsbyshsv  and  ths  systea  library  function)  will 
achiova  approx i sat sly  ths  sass  answer,  however  ths 
algorithm  used  to  achieve  ths  answer  will  be  different. 
This  special  case  is  done  to  measure  the  effect  of  not 
having  a  trigonometric  function  procedure  available  in  the 
JRS  HLL  library.  Included  in  this  test  is  the  resulting 
inaccuracy  of  the  Chebyshev  approximation,  the  space  used 
to  store  the  routine  in  the  WCS,  and  the  execution  speed. 

With  the  specification  of  the  testing  methods,  testing 
categories,  and  timing  mechanisms,  the  next  step  is  to 
compare  the  results.  A  comparison  of  execution  times  of 
each  program  in  each  category  of  testing  was  accomplished. 
During  the  explanation  of  the  comparison,  an  analysis  of 
the  data  and  a  summary  of  the  results  is  given.  This 
analysis  will  allow  us  to  specify  which  applications  the 
AIK3S  improves  and  which  applications  the  AM6S  does  not 
improve. 


The  results  of  ths  tests  can  now  be  analyzed  and 
coepared  since  the  -Factors  affecting  microprogramming  and 
the  processes  involved  in  testing  have  been  reviewed.  To 
insure  that  the  analysis  is  complete,  the  raw  data  is 
presented  first  followed  by  an  analysis  of  the  results. 
The  analysis  will  first  compare  the  effects  of  language 
features  on  each  of  the  tests  and  then  compare  the  results 
of  the  different  types  of  tests  (ie.  while  loop,  do  loop, 
etc.).  The  final  section  of  this  chapter  discusses  the 
validity  of  the  tests  and  analyzes  the  error  in  the  tests. 

A.  RAW  DATA  ANALYSIS 

The  raw  data  is  given  in  Table  9-1.  All  tests  were 
programmed  in  the  four  categories  explained  in  Chapter  Four 
but  only  five  of  the  algorithms  were  hand  compacted.  The 
times  shown  in  Table  5-1  are  the  mean  values  of  ten  tests 
of  each  algorithm  without  other  users  using  the  VAX  11/780. 
The  number  in  parenthesis  in  the  table,  if  shown,  is  the 
value  that  should  be  added  or  subtracted  from  the  mean 
value  to  define  the  99%  certainty  range  for  the  mean  value. 
If  no  number  is  shown  in  parenthesis,  then  the  value  is  one 
hundredth  of  a  second.  An  explanation  of  how  the  range  was 
determined  is  given  in  the  last  section  of  this  chapter. 


Table  5-1:  Test  Results  (in  seconds) 

NO  COMMON  COMMON  COMMON  HLL  HAND 

STRAIGHT  STRAIGHT  SUBROUTINE  JRS  OPTIMIZED 
ERQBBfiB _ FORTRAN  FORTRAN  FORTRAN  HLL  MICROCODE 


While  Loop 

11.11 

18.18 

20.20 

11.12 

Do  Loop 

7.07 

10.10 

12.12 

10.12 

Factorial 

4.61 

7. 46  (.02) 

7.54 

8.88 

8.63 

Summati  on 

4.49 

9. 77  (.02) 

9.77  (.01) 

5.70 

2.87 

Cosine 

Cosine (Lib) 

5.09 

G.  72 

6.17 

9. 43(.  14) 

6.24 

8.62 

FFT 

11. 16 (.02) 

13. 08 (.02) 

13. 74 (.03) 17.37 ( 

.02) 

Sieve 

3.39 

4. 18 

4. 10 (.02) 

2.49 

Binary  Search 

2.50 

3.54 

3.43 

1 . 17 

0.79 

Bubble  Sort 

3.  59  (.03) 

4. 58  (.03) 

4.  79  (.03) 

3.77  ( 

.02) 

2.29 

Quicker  Sort 

6. 78 ( . 04) 

9.75 

9. 80  (.02) 

4.75 

4.36 

Bit  Reversal 

4. 75 ( . 02) 

7.49 

7.50 

2.25 

Bit  Manip 

8. 40 (.04) 

8. 53  (.07) 

8.41  (.02) 

2.98 

Not  all 

programs 

were  hand 

compacted  because 

the 

compaction 

required 

special 

knowledge 

of 

VAX 

microprogramming  and  also  required  a  significant  amount  of 
time.  The  tests  to  be  compacted  Mere  chosen  to  insure  that 
a  representative  sample  was  taken  from  each  of  the 
categories.  Another  criterion  for  choosing  the  tests  for 
compaction  was  to  choose  some  tests  which  were  faster  in 
Fortran  and  some  tests  that  were  faster  in  microcode  to 
compare  the  effect  of  compaction.  The  basic  purpose  was  to 
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in  general,  how  ouch  better  wo  could  do  with  tho 


compaction  without  oxorting  a  trooondoua  amount  of  effort. 
That  purpose  was  attained  by  compacting  the  five  selected 
programs. 

When  looking  at  the  times  from  Table  5-1  in  general , 
some  of  the  results  were  counter-intuitive  because  the 
expected  result  is  to  have  the  microcoded  version  execute 
faster.  In  many  cases  the  Fortran  versions  were  faster  or 
as  fast  as  the  microcoded  versions.  This  can  be  attributed 
to  three  facts  mentioned  in  earlier  chapters.  First,  the 
microcode  that  is  generated  from  the  HLL  by  this  ACIGS  is 
not  compacted.  Second,  the  Fortran  compiler  generates 
highly  optimized  code.  The  third  reason  is  that  some  of 
the  routines  used  as  tests  involve  floating  point 
arithmetic  or  integer  multiplication,  both  of  which  use  the 
floating  point  accelerator.  The  use  of  the  floating  point 
accelerator  results  in  increased  overhead  for  microcode. 
These  three  factors,  separately  or  combined,  resulted  in 
some  cases  where  the  Fortran  outperformed  the  microcode. 

B.  EFFECTS  OF  LANGUAGE  FEATURES  ON  THE  TESTS 

The  different  Fortran  language  features  were  tested  to 
isolate  the  effects  of  the  different  techniques  for  data 
passing.  The  important  point  is  that  the  tests  were 
programmed  as  a  ’normal’  programmer  would  program  them.  No 
special  attempts  were  made  to  make  specific  tests  run  well 


in  either  of  the  two  languages.  Since  it  was  unlikely  that 
a  determination  could  be  eade  as  to  what  the  "normal" 
programmer  would  dof  the  three  different  Fortran  tests  were 
devised  so  that  the  user  could  determine  which  method  was 
needed  for  his/her  application.  Of  course,  if  a  programmer 
chose  the  Fortran  without  subroutines  or  common  data  areas, 
then  he/she  was  giving  up  the  use  of  some  very  important 
software  engineering  techniques. 

In  general,  the  tests  of  the  different  Fortran  language 
features  resulted  in  more  speed  with  fewer  features  and 
less  speed  with  more  features.  The  fastest  Fortran 
technique  in  all  cases  was  the  version  that  used  no 
subroutines  and  no  common  data  structure.  The  use  of 
common  data  areas  with  and  without  subroutines  resulted  in 
somewhat  unexpected  data.  The  expected  results  were  for 
the  versions  using  common  data  without  subroutines  to  run 


faster 

than 

the  version  using 

common  data 

areas  wi th 

subroutines. 

This  occurred 

in 

most  but  not 

all  of  the 

tests. 

In 

general ,  there 

was 

only  a  slight 

increase  in 

execution  time  when  a  subroutine  with  common  was  compared 
with  the  same  program  with  no  subroutines  but  with  common 
data,  which  implies  that  the  overhead  of  calling  and 
returning  from  a  subroutine  (without  any  parameters)  is  not 
very  significant.  In  fact,  in  most  cases  there  was  no 
statistical  difference  (discussed  in  the  last  section  of 
this  chapter)  between  the  tests  with  subroutines  and  common 


and  the  tests  without  subroutines  but  with  common.  One 
possible  explanation  is  that  the  variation  in  the  length  of 
tiee  to  start  and  stop  the  tieing  eechanise  is  greater  than 
the  length  of  tiee  required  to  call  and  return  froe  a 
subroutine.  Since  there  is  only  a  single  call  in  each 
testf  the  results  eay  not  show  any  difference  when  the 
subroutine  is  used. 

The  hand  coepaction  of  the  JRS  HLL  Microcode  always 
resulted  in  faster  execution  than  the  uncoepacted  JRS  HLL 
Microcode.  This  is  as  expected  since  the  hand  compacted 
code  was  derived  from  the  JRS  HLL  microcode.  In  no  case 
were  instructions  expanded  (HnM  microinstructions  encoded 
into  "n+k"  microinstructions,  where  k  >  0)  and  therefore  no 
increase  in  execution  speed  for  the  hand  compacted  code  was 
expected.  It  should  be  realized  that  the  method  used  for 
generating  the  hand  compacted  microcode  does  not  really 
produce  hand  compacted  microcode  because  the  compaction  was 
done  to  an  existing  program.  The  mi croprogr ammer  did  not 
set  up  the  problem  according  to  his  own  liking.  The 
mi cropr ogr ammer  simply  took  the  generated  microcode  and 
compacted  it  using  his  knowledge  of  the  VAX 
microprogramming.  If  microoperations  could  be  combined 
with  other  microoperations  to  reduce  the  total  number  of 
microinstructions,  they  were  combined.  The  important  point 
to  remember  is  that  the  microcode  was  machine  generated  and 
hand  compacted. 


Another  point  that  must  bo  montionod  about  tho  data 
analysis  in  gonoral  is  tho  ovorhoad  involved  in  tho  JRS  HLL 
microcode.  Tho  length  of  time  required  to  make  the  call  to 
the  microcode  plus  the  overhead  involved  in  the  use  of 


common  data  is  not  documented  anywhere  and  can  not  be 


determined  in  this  study  because  the  timing  mechanism  is 
not  accurate  enough.  Therefore  during  the  analysis  of  the 


data,  it  is  important  to 


that  when  the  JRS  HLL 


microcode  is  called  there  is  a  certain  amount  of  overhead 
in  the  call.  This  overhead  is  most  likely  more  than  the 
overhead  of  a  subroutine  call  in  Fortran  because  the  state 
of  the  micromachine  must  be  initialized.  The  other  point 
is  that  all  data  in  the  microcode  is  in  a  common  data  area 
and  therefore,  as  has  been  documented,  requires  extra  time 
to  access.  Probably  the  best  comparison  between  Fortran 
and  JRS  HLL  is  to  use  Fortran  with  common  data  and 
subroutines  because  the  overhead  of  the  common  data  and  the 
subroutine  calls  approximately  cancel  out  each  other. 
Therefore,  it  is  possible  to  compare  the  actual  speed  of 
each  method  rather  than  comparing  the  overhead  involved  in 
each  method. 

The  overhead  involved  in  the  subroutine  call  and  the 
common  data  area  will  not  always  be  constant.  If  there  are 
only  a  few  data  items  being  accessed  in  the  subroutine  then 
all  of  the  data  values  can  be  placed  in  registers  which 
reduces  the  access  time.  However,  if  an  array  or  a  large 


nuabar  of  variables  ara  baing  accaasad  than  it  will  taka 
longar  to  gat  tha  data  in  and  out  bacausa  of  tha  uaa  o-f  a 
common  data  araa.  Tha  important  point  whan  looking  at  the 
comparisons  baing  aada  in  tha  next  few  sections  is  that  if 
common  data  structures  and  subroutines  are  used  in  Fortran 
(which  is  almost  always  dona)  than  tha  execution  spaed  will 
not  be  as  fast  as  tha  fastest  Fortran  test.  If  the 
decision  is  mads  to  not  usa  the  common  data  structures  and 
subroutines  than  tha  programmer  will  be  giving  up 
modularity  of  design  and  other  software  engineering 
techniques  for  faster  execution. 

C.  COMPARISON  OF  TEST  RESULTS 

This  section  will  compara  tha  results  of  the  Fortran 
versions  with  the  HLL  versions.  The  comparison  will  be 
dona  within  tha  four  basic  areas  defined  in  Chapter  Three. 
Each  test  algorithm  is  available  in  an  appendix  in  both  the 
Fortran  implementation  and  the  JRS  KA.L  implementation.  The 
Fortran  version  of  the  algorithms  available  in  the 
appendices  is  the  version  in  a  subroutine  with  a  common 
data  structure.  The  algorithms  have  been  placed  in  the 
appendices  according  to  tha  basic  area  that  they  are 
testing.  Table  5-2  lists  which  appendices  contain  which 
individual  tests.  The  algorithms  have  been  removed  from  the 
individual  test  harnesses,  however  an  example  harness 
(Factorial  Program)  is  available  in  Appendix  E. 


Table  5-2:  Table  of  Teste  in  Appendic 


Appendix  A:  Integer  Mathematics 

1.  Do  Loop 

2.  While  Loop 

3.  Sueeation 

4.  Factorial 

Appendix  B:  Floating  Point  Matheeatics 

1.  Fast  Fourier  Transform 

2.  Chebyshev  Cosine 

Appendix  C:  Sorting /Searching 

1 .  Bi nary  Search 

2.  Quicker  Sort 

3.  Sieve  of  Eratosthenes 

4.  Bubble  Sort 

Appendix  D:  Bit  Manipulations 

1.  Bit  Manipulation 

2.  Bit  Reversal 

1.  Integer  Mathematics 

The  basic  loops  were  included  in  the  integer 
mathematics  category  because  all  that  occurs  in  the  loop 
construct  is  an  increment  and  test  until  the  condition  is 


met, 

at 

which 

time 

a  jump  out  of  the 

loop  is  executed. 

This 

is 

very 

simple 

and 

uncompl icated 

so  the  expectation 

was 

that 

the 

microcoded 

version  would 

not  be  much  better 

than 

the 

Fortran  version. 

In  fact,  the 

JRS  HLL  WHILE  loop 

was 

only 

as 

fast  as 

the 

fastest  Fortran  version  while  the 

fastest  Fortran  DO  loop  was  much  better  than  the  JRS  HLL  DO 
loop.  The  results  imply  that  the  Fortran  code  is  highly 
optimized. 
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Sines  each  of  the  loop  tests  involves  only  one 
variable,  the  comeon  date  area  access  time  penalty  can  not 
be  blamed  since  the  variable  Mas  stored  in  a  register. 
There  is  the  overhead  o-f  calling  the  subroutine  and  setting 
up  the  data  registers  however  that  alone  should  not  cause 
the  microcode  to  be  as  slow  or  slower  than  Fortran.  The 
only  logical  answer  is  that  the  optimization  and  compaction 
o-f  the  di-f-ferent  codes  has  a  large  e-f-fect  on  the  execution 
speed.  One  other  important  point  about  the  loops  is  that 
in  all  cases  the  DO  loop  is  faster  than  the  WHILE  loop. 
This  is  most  likely  due  to  better  optimization  because  the 
looping  variable  is  part  of  the  loop  construct  while  in  the 
WHILE  construct  the  incrementing  of  the  variable  occurs 
independently  from  the  language  construct. 

The  next  test  was  the  summation  of  an  integer 
value.  This  test  measured  how  fast  addition  could  be  done, 
however,  since  each  summation  could  be  done  very  quickly,  a 
loop  construct  was  set  up  to  repeat  the  summation  10,000 
times.  The  results  were  that  the  fastest  Fortran  version 
was  slightly  faster  than  the  JRS  HLL  version,  even  after 
subtracting  the  overhead  of  the  WHILE  loop.  This  result 
was  not  expected  but  can  be  explained  as  the  result  of  lack 
of  code  compaction  because  when  the  summation  microcode  was 
compacted  by  hand,  the  execution  speed  became  significantly 
faster  than  the  fastest  Fortran  version. 


Th*  -final  integer  nathaaatics  tast  is  tha  factorial 
program.  Tha  tast  Mas  limited  to  a  maximum  input  of  12 
bacausa  13!  is  beyond  tha  limits  of  tha  storage  capacity  of 
a  four  byta  integer.  Therefore,  to  make  the  tast  last  long 
enough  for  timing  purposes,  a  loop  Mas  set  up  to  calculate 
the  factorial  100,000  times  prior  to  stopping. 

The  result  of  the  factorial  test  validates  the  JRS 
claim  that  integer  multiplication  is  sIom  because  of  the 
use  of  the  FPA.  After  subtracting  the  overhead  of  the 
loop,  the  JRS  HLL  is  still  tMice  as  sIom  as  the  fastest 
Fortran  version.  In  fact,  the  sloMest  Fortran  version, 
using  common  data  areas  and  a  subroutine,  is  faster  than 
the  JRS  HLL  microcode.  Therefore,  the  AMGS  should  not  be 
used  for  integer  multiplication  intensive  algorithms 
because  of  the  FPA  overhead. 

The  JRS  HLL  did  not  result  in  any  performance 
improvements  for  any  of  the  integer  arithmetic  tests 
accomplished  in  this  study.  This  Mas  due  either  to  a  lack 
of  microcode  compaction  or  to  the  use  of  the  FPA  for 
integer  multiplication. 

2.  Floating  Point  Mathematics 

There  Mere  tMO  algorithms  for  testing  the  floating 
point  mathematics  applications,  the  Chebyshev  Cosine 
routine  and  the  Fast  Fourier  Transform  (FFT) .  Both 
algorithms  substantiated  the  JRS  claim  that  floating  point 


calculations  Mould  not  do  Mell  in  microcode.  The  execution 


speed  of  the  FFT  HLL  version  mss  about  twice  as  long  as  the 
fastest  Fortran  version.  The  other  Fortran  versions  were 
of  course  slower  than  the  fast  version  due  to  the  use  of 
common  and  a  subroutine  call. 

The  Chebyshev  Cosine  routine  gave  the  same  type  of 
results  as  were  attained  for  the  FFT,  a  slow  down  of  about 
80%,  caused  by  the  FPA.  However,  the  interesting  part  of 
this  test  is  in  comparing  the  speeds  of  the  Chebyshev 
Cosine  with  the  speed  of  the  Cosine  Library  function.  The 
overhead  of  the  Library  Function  call  is  very  high  because 
even  the  JRS  HLL  (which  is  the  slowest  Chebyshev  version) 
is  faster  than  the  Library  Function  test.  Therefore  it  is 
justifiable  to  say  that  while  the  use  of  the  HLL  for  doing 
trigonometric  computations  is  not  a  great  improvement,  this 
test  does  demonstrate  that  the  commonly  used  features  of  a 
language  can  be  costly  and  that  the  microcode  does  give 
slightly  better  performance  than  the  Library  Function. 

Both  tests  in  this  basic  area  support  the  JRS  claim 
that  floating  point  arithmetic  will  not  be  helped  when 
coded  in  JRS  microcode.  Since  that  point  has  been  well 
documented,  we  will  now  look  at  the  sorting  and  searching 
tests  to  see  what  kind  of  results  they  produce. 

3.  Sorting  sntf  §g#rghinq 


There  were  four  tests  accomplished  in  this  area  and 
three  of  the  four  gave  results  that  were  favorable  for  the 
JRS  HLL.  The  one  test  where  the  JRS  HLL  ended  up  being 


slightly  slower  was  the  bubble  sort  algorithm.  There  was 
no  looping  mechanism  to  subtract  away  -from  the  problem  and 
the  algorithm  consists  o-f  only  assignment  statements  and 
comparisons.  Therefore  there  is  no  reason  to  explain  the 
slow  performance  except  -for  the  lack  o-f  compaction  o-f  the 
mi crocode. 

The  Sieve  o-f  Eratosthenes  program  test  resulted  in 
the  JRS  HLL  version  running  about  25'/.  quicker  than  the 
fastest  Fortran  version.  This  result  was  expected  since 
the  microcode  is  able  to  do  comparisons  rather  quickly. 
One  other  interesting  point  became  apparent  during  this 
test.  Since  the  tests  are  supposed  to  be  written  as  a 
’normal’  programmer  would  write  them,  it  is  sometimes 
easier  to  use  a  DO  loop  rather  than  a  WHILE  loop  or  vice 
versa.  However,  when  trying  to  get  code  to  execute 
quickly,  it  is  obvious  that  the  Fortran  DO  loop  is  much 
faster  than  the  Fortran  WHILE  loop  as  shown  in  Table  5-1. 
On  the  other  hand,  the  JRS  HLL  DO  loop  is  not  nearly  as 
fast  as  the  Fortran  DO  loop  and  only  slightly  faster  than 
the  JRS  HLL  WHILE  loop.  Therefore,  a  program  is  dependent 
upon  the  language  construct  chosen  by  the  individual 
programmer  and  if  a  DO  loop  is  used  in  the  Fortran  version 
while  a  WHILE  loop  is  used  in  the  JRS  HLL  version,  there 
will  be  a  greater  difference  in  results. 

To  avoid  this  discrepancy  in  results  (after  it  was 
noticed  in  the  initial  results),  the  Sieve  algorithm  was 


r ■written  in  both  languages  using  DO  loops  because  a 
definite  iteration  (the  DO  loop  function)  was  what  was 
needed  in  the  algorithm.  The  change  in  speed  of  the 
algorithms  due  to  the  use  of  the  DO  loop  was  not 
tremendous.  However,  this  test  does  demonstrate  the  effect 
of  using  different  language  constructs  plus  the  use  of 
’normal *  programming  techniques  and  constructs. 

The  Quicker  Sort  algorithm  demonstrated  the  speed 
of  the  microcode  as  was  expected.  Since  only  comparisons, 
additions,  and  subtractions  with  one  multiply  are  used, 
this  algorithm  is  vary  fast.  The  Binary  Search  algorithm 
results  ended  up  with  the  JRS  HLL  being  twice  as  fast  as 
the  fastest  Fortran  version.  Again,  this  was  expected 
because  of  the  use  of  comparisons  during  most  of  the 
algorithm.  This  algorithm  produced  the  second  best 
performance  increase  by  the  JRS  HLL  microcode  of  all  of  the 
tests.  This  was  probably  due  to  the  fact  that  the 
algorithm  has  only  one  DO  loop,  one  WHILE  loop,  and  the 
rest  of  the  algorithm  is  made  up  of  if-then  constructs 
which  are  simple  comparisons. 

The  sorting  and  searching  tests  were  a  good 
application  of  the  JRS  HLL  microcode.  For  the  most  part, 
the  microcode  resulted  in  faster  execution  speed  than  the 
corresponding  Fortran  program,  however,  the  increase  was 
never  much  more  than  twice  as  fast. 
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The  last  basic  arsa  of  ths  tssts  is  ths  bit 
Manipulation  arsa.  Two  tests  wars  accomplished  in  this 
area  and  both  gave  positive  results  -for  the  Microcode 
version.  The  bit  reversal  program  ended  up  with  a  large 
increase  in  execution  speed.  The  program  was  simply  used 
to  switch  items  in  an  array.  No  comparing  was  needed  since 
the  program  switched  the  items  in  the  array  according  to  a 
convolution  scheme.  This  test  demonstrates  the  speed  o-f 
the  assignment  statement  in  the  microcode. 

The  bit  manipulation  program  also  resulted  in 
faster  execution  for  the  JRS  HLL  than  for  the  fastest 
Fortran  version.  The  main  reason  for  this  fast  execution 
is  that  the  Fortran  version  uses  system  library  routines 
which  are  slow  to  call  and  execute.  Therefore,  it  is 
actually  the  slowness  of  the  Fortran  library  routine  rather 
than  the  speed  of  the  microcode  that  gives  the  increased 
throughput.  The  important  point  is  that  the  microcode  does 
improve  upon  the  execution  speed  of  the  corresponding 
Fortran  code  and  therefore  the  AMGS  gives  a  performance 
increase  for  these  kinds  of  operations.  It  is  also 
important  to  note  that  this  program  was  simply  a  series  of 
calls  to  the  microcode  or  Fortran  routines  that  perform  the 
functions.  No  other  operations  besides  the  driving  DO  loop 


were  needed  in  the  algorithm  and  therefore  it  was  a  very 
accurate  test  of  the  actual  speed  of  the  tested  code. 


0.  TEST  ERROR  ANALYSIS 

Bkaum  this  tasting  was  dons  on  a  virtual  memory 
system,  thsrs  Mas  a  possibility  of  arror  dua  to  tha  timing 
aachanisa  baing  switchad  on  and  off  aany  tiaas.  Tha 
intantion  of  tha  tests  Mere  to  give  tha  user  an  accurate 
astiaata  of  how  auch  spaed  Mould  bo  gained  by  using  tha 
AMBS.  To  insure  that  the  astiaata  is  as  accurate  as 
possible,  a  computation  Mas  made  to  determine  tha 
confidence  interval  for  the  mean.  Also,  to  determine  if 
the  virtual  memory  system  Mas  affecting  the  results,  a  test 
Mas  performed  that  allOMS  us  to  state,  Mith  a  specified 
amount  of  confidence,  Mh ether  the  virtual  memory  system 
affects  the  results. 

Since  Me  made  several  runs  of  each  test,  ms  Mere  able 
to  determine  a  mean  execution  time  for  each  test  and  a 
standard  deviation  for  each  test.  However  it  is  important 
to  do  a  statistical  analysis  to  determine  hoM  confident  we 
are  of  these  results.  The  question  of  confidence  was 
ansMered  by  using  the  Student  T  distribution  (because  of 
the  small  sample  size)  to  find  the  interval  within  which 
the  mean  will  fall  with  the  specified  amount  of  confidence. 
For  these  tests,  a  confidence  of  99X  was  desired.  The 
following  formula  was  used  to  determine  the  range  of  the 
mean  execution  time  for  99%  confidence.  The  value  for  ’ t ’ 


is  dependent  on  the  level  of  confidence  desired  and  was 


r««d  fr am  a  Stud ant  T  distribution  chart.  CRef.  12:  p.  4883 
* X*  is  tha  aaan  of  tha  sample,  'S'  is  tha  sampla  standard 
deviation,  and  'n'  is  tha  total  nuabar  in  tha  population. 

X  -  t(S  /  n)  <  u  <  X  +  t(S  /  n) 

To  find  tha  affect  of  the  virtual  memory  system 
required  performing  each  test  under  two  different 
conditions.  First,  each  tost  Mas  made  Mith  other  users  on 
tha  system.  This  could  be  anywhere  from  one  other  user  to 
twenty  users.  Next,  each  test  was  performed  with  all  other 
users  locked  out  of  the  system  and  the  entire  computer 
running  only  the  system  support  programs  and  the  tests  for 
this  project.  Then  a  hypothesis,  called  the  null 
hypothesis  (HI)  was  assumed.  The  null  hypothesis  was  that 


both  samples  came  from  the 


population.  To  test  the 


null  hypothesis  we  used  the  following  formula,  where  XI  and 
X2  are  means,  SI  and  S2  are  standard  deviations,  and  nl  and 
n2  are  sample  sizes  (in  this  test,  10). 

t  -  (XI  -  X2)  /  SORT ( (Sl/nl  +  S2/n2) ) 

If  the  calculated  't'  (from  above)  >*  't'  (from  the 
chart  based  on  99%  certainty),  then  the  null  hypothesis  can 
be  rejected.  In  other  words,  the  samples  do  not  come  from 
the  same  population  which  means  that  the  number  of  users  on 
the  system  does  affect  the  results.  If  the  value  't'  <  't' 
(from  the  chart)  then  they  could  be  from  the  same 
population  and  the  other  users  on  the  system  may  not  affect 


the  results.  CRef.  12:  pp.  214  -  2213 
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The  riiulti  of  the  first  confidsncs  test  mentioned 


abovs  msts  enumerated  in  Table  5-1  Mith  data  taken  on  the 
VAX  11/780  with  all  other  users  locked  out  of  the  system. 
In  general,  the  results  Mere  very  accurate  in  that  they 
gave  a  small  range  in  Mhich  the  anticipated  results  Mould 
fall.  The  null  hypothesis  test  gave  mixed  results.  It  Mas 
hoped  that  Me  Mould  be  able  to  state  that  the  tests  Mith 
other  users  on  the  system  Mould  be  from  a  different  sample 
set  than  the  tests  Mithout  other  users.  HoMever,  that  Mas 
not  the  case  in  general.  In  most  situations,  the  tests 
Mith  other  users  on  the  system  simply  shoMed  a  higher  mean 
but  the  possible  range  for  99%  certainty  included  most  or 
all  of  the  range  for  the  test  Mithout  other  users. 
Therefore,  in  the  second  test  the  null  hypothesis  could  not 
be  refuted  in  most  cases.  HoMever,  it  does  appear  that 
other  users  on  the  system  do  affect  the  timing  mechanism 
but  only  because  they  increase  the  standard  deviation  of 
the  tests  and  thereby  Mi den  the  range  of  values  for  99% 
certainty. 
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The  purpose  o-f  this  project  was  to  aval  us ts  ths 
performance  o-f  tha  JRS  AMGS.  This  has  baan  accompl ishad  by 
comparing  ths  performance  o-f  tha  JRS  HLL  nicrocoda  with 
Fortran  coda  on  tha  VAX  11/780.  Tha  tasting  has  produced 
sons  unexpected  results  and  has  shed  light  on  several 
interesting  points.  Tha  -first  point  being  that  microcode 
will  not  always  result  in  faster  execution  of  an  algorithm. 
During  the  testing  it  became  apparent  that  this  was  due 
mainly  to  two  causes.  One  reason  is  that  for  the  speed  of 
microcode  to  be  fully  utilized,  the  microcode  must  be 
properly  compacted.  The  other  is  that  the  use  of  the  FPA 
by  the  microcode  results  in  slightly  degraded  performance. 

The  second  point  is  the  effect  of  the  different 
language  features  upon  the  execution  speed  of  the  Fortran 
code.  When  the  fastest  Fortran  code  was  compared  with  the 
microcode  there  were  several  cases  where  the  Fortran  was 
much  faster  than  the  microcode.  However,  when  the  slowest 
Fortran  code  was  compared  with  the  HLL  microcode  the 
microcode  was  faster.  This  was  true  in  all  cases  except 
when  the  FPA  was  required.  Testing  the  effects  of  the 
language  features  revealed  an  important  point  since  the  use 
of  the  features  allows  a  programmer  to  use  software 
engineering  techniques.  When  these  features  are  not  used 


it  is  vary  difficult  for  s  programmer  to  usa  soft war a 
anginoaring  tachniquas  such  ss  sodulsrity  and  inforsation 
hiding.  Without  thsss  tachniquas  tha  coda  say  run  fast  but 
it  is  usually  vary  difficult  to  davalop  and  always  hard  to 
oaintain.  Thar of ora,  a  tradeoff  oust  ba  mad a  batwaan  tha 
c an van i one a  and  security  of  the  language  features  or  the 
speed  advantage  possible  without  the  features. 

A.  CONCLUSIONS  FROM  THE  DATA  ANALYSIS 

The  analysis  of  the  data  allows  for  some  conclusions  to 
be  drawn  about  the  use  of  the  AM6S  for  specific 
applications.  The  conclusions  are  grouped  in  terms  of  the 
four  general  areas  defined  in  Chapter  Four  rather  than 
about  individual  tests  so  that  a  user  may  make  a  decision 
based  upon  a  general  category  of  application  rather  than  a 
specific  example  program.  Specific  program  results  will  be 
mentioned  if  the  results  of  that  test  vary  significantly 
from  the  other  tests  in  the  specific  area  being  discussed. 

The  integer  mathematics  application  resulted  in  no 
advantage  from  the  use  of  the  AMGS.  This  is  most  likely 
due  to  the  lack  of  compaction  of  the  microcode.  This 
conclusion  is  justified  because  when  the  summation 
program's  microcode  was  compacted  and  subsequently 
executed,  the  results  were  a  significant  increase  in 
execution  speed.  Therefore,  it  is  assumed  that  if  the  code 
was  properly  compacted  the  execution  speed  would  be 


vs  w  ^  *-v  tt  *-v 


t  ■>  -  i"  '”»•* 


£ 

✓ 

i 


i 

! 

P 

r 

r 

i* 

r 

i 

i 

r* 

i 


improved.  The  only  test  in  the  integer  mathematics 
category  that  would  not  be  greatly  improved  by  the 
compaction  ie  the  factorial  test.  This  is  due  to  the  use 
of  the  FPA  for  integer  multiplication. 

The  floating  point  mathematics  area  also  turned  out  not 
to  be  a  good  application  for  the  AMOS.  This  was  expected 
and  the  probability  that  this  would  happen  is  documented  in 
the  JRS  HLL  manual.  The  difference  in  the  magnitude  of  the 
execution  speeds  is  interesting  because  the  JRS  HLL  runs 
about  bOX  slower  than  the  fastest  Fortran  version. 

The  sorting  and  searching  application  area  demonstrated 
promising  results  for  the  AMOS.  In  three  of  the  four  tests 
the  AMSS  version  was  significantly  faster  than  the  fastest 
Fortran  version.  In  one  test  (the  bubble  sort),  the 
Fortran  was  faster  than  the  AM6S  but  this  is  probably  due 
to  a  lack  of  compaction  rather  than  due  to  a  lack  of 
applicability  to  the  AMGS.  From  the  results  of  these  four 
tests,  it  is  justifiable  to  say  that  sorting  and  searching 
are  both  good  application  areas  for  using  the  AMGS. 
However,  it  should  be  noted  that  at  this  point  in  the  AMGS 
development,  the  difference  in  execution  speeds  is  not  as 
good  as  it  could  be  with  compacted  microcode. 

The  bit  manipulation  area  also  resulted  in  favorable 
results  for  the  AMGS.  In  fact,  this  was  the  best 
applications  area  of  the  JRS  HLL  because  both  tests  ended 
up  more  than  doubling  the  speed  of  the  Fortran  code.  Of 
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coutm,  one  of  the  taiti  Mas  slow  in  ths  Fortran  version 
bacausa  of  tha  usa  of  library  functions,  hoMavar,  sines 
that  Mas  tha  only  May  to  aasily  par for a  that  function  in 
Fortran,  that  Mas  tha  May  it  Mas  programmed. 

Now  that  wa  hava  dafinad  tha  araas  whara  improvement  is 
possibla  tha  quastion  remains  about  Mhathar  tha  AMGS  should 
ba  usad  by  NRL?  Tha  ansMar  to  this  aust  be  based  on  more 
factors  than  siaply  execution  spaed.  Wa  aust  also  consider 
systaa  cost,  ease  of  usa,  and  actual  i aprovaaent  possible. 

Since  tha  i aprovaaent  is  at  tha  best  two  to  three  times 
better  than  tha  Fortran  coda,  the  cost  in  money  and 
programming  effort  can  not  be  justified  by  the  possible 
gain.  When  tha  systaa  is  improved  to  include  microcode 
compaction  with  a  resultant  increase  in  performance,  then 
tha  AliGS  cost  may  ba  justified.  Somewhere  in  the  area  of 
an  order  of  magnitude  increase  in  spaed  is  necessary  before 
the  cost  of  the  system  (money  and  programming  effort)  is 
justified. 

The  AliGS  did  prove  capable  of  producing  microcode  that 
is  as  fast  or  slightly  faster  than  tha  compiled  Fortran 
coda.  Therefore,  if  an  application  exists  that  will  use  a 
microcodad  machine,  the  AMGS  is  capable  of  producing  a 
large  amount  of  ’acceptable'  microcode.  Tha  AMGS  can 
produce  tha  microcode  very  quickly  in  comparison  to 
conventional  methods.  Also,  the  AMGS  can  produce  large 


amounts  of  microcode  at  much  less  expense  than  is  possible 


Kith  hand  Microcoding.  Tha  AM6S  therefore  provides  a 
aachanisa  for  producing  * acceptable’  Microcode  efficiently 
and  inexpensively. 

One  other  possible  use  of  the  AMGS  is  to  produce 
Microcode  that  can  be  hand  coapacted.  If  the 
aicroprograaaers  are  available,  the  HLL  can  be  used  to 
produce  an  uncoapacted  Microprogram  and  then  the 
aicroprogrammers  can  be  used  to  compact  the  HLL  microcode. 
This  technique  produced  very  good  results  during  the  study 
and  the  cost  in  microprogrammer’s  time  is  much  less  than 
writing  a  complete  microprogram  from  scratch. 

B.  FUTURE  RESEARCH  POSSIBILITIES 

There  are  several  areas  that  can  be  researched  as  a 
continuation  of  this  work.  Some  areas  relate  directly  with 
this  type  of  microcode  generating  system  but  other  areas 
are  points  that  became  obvious  during  the  study  yet  had  to 
be  ignored  to  keep  the  scope  of  the  thesis  within  reason. 
One  area  of  research  is  to  evaluate  the  next  version  of  the 
JRS  AMGS.  The  next  version  is  now  available  and  has 
microcode  compaction  which  should  result  in  much  better  run 
time  results.  Also,  the  revision  has  more  language 
constructs  that  more  closely  parallel  the  constructs 
available  in  the  more  modern  block  structured  languages. 
With  these  revisions,  it  should  make  the  system  easier  to 
use  and  give  better  results. 
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Sine*  on*  of  the  suggested  advantages  of  tha  AMGS  ia 
portability  o-f  tha  JRS  HLL  microcode,  it  is  important  -for 
this  systam  to  ba  implamantad  on  anothar  machina  so  that 
tha  work  involvad  in  doing  such  a  job  can  ba  documanted. 
Tha  possibility  of  implementing  tha  AMGS  on  anothar  machine 
is  already  a  stated  goal,  but  until  it  is  done,  a  proper 
testing  o-f  both  implementations  can  not  be  made.  The 
comparison  o-f  the  results  o-f  the  tests  would  document  the 
portability  o-f  the  system  and  demonstrate  the  ease  with 
which  the  machine  transition  could  be  made.  It  would  also 
be  advantageous  to  have  another  language  such  as  Fortran  or 
Pascal  used  as  the  source  code  instead  o-f  the  JRS  HLL. 
This  would  make  the  AMGS  accessible  to  more  people 
resulting  in  a  better  chance  of  the  system  becoming  more 
widely  used. 

The  cost  of  using  different  language  features  in 
Fortran  was  interesting  even  though  it  was  a  sidelight  of 
the  study.  Further  study  could  be  done  as  to  the  exact 
cost  of  using  a  subroutine  with  or  without  parameters. 
Also,  the  actual  cost  of  using  a  common  data  area  could  be 
documented  so  that  a  user  knows  how  much  the  use  of  such  a 
feature  is  costing.  Of  course  this  kind  of  testing  would 
be  system  dependent,  but  if  that  system  used  these  language 
constructs  for  a  significant  amount  of  work,  the  results 
could  be  very  helpful  in  making  decisions  during  future 
programming  efforts. 
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The  -Final  suggest  ion  -for  further  research  has  to  do 


with  defining  the  application  areas.  It  would  be  very 
helpful  if  there  were  some  guidelines  as  to  what 
applications  use  what  operations.  These  guidelines  would 
be  very  helpful  during  future  system  performance  evaluation 
efforts. 
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APPENDIX  A 


INTE3ER  »l A  f  H F M A  1 1  C S  ALGORITHMS 


f  HE  00  LOOP  [  M  A  F3H  TPAiM  SUBROUTINE 
•I'  IS  the  LOOP  VARIABLE  rtHILE  'K' 
rs  THE  total  NLM3ER  OF  T  T  M  F  $  THE  LOOP 
I LL  BE  EX  FOOT  ED. 

S'JORO'JTME  DOLOOP 


COMMON  /,NCS/  l,< 


\  I  Hi S  PROGRAM  IS  A  30  LOOP  4RTTTEN  IN  JRS  HLL  \ 


program  doloop; 

INTEGER  I  * K  ; 

00  I  =  l  ro  k; 

Ehddo; 

stop; 

£00.  \  OF  OOLOOP  \ 


C  THIS  IS  THE  a/HILE  LOOP  IN  FORTRAN 
C  COUNT  HOLDS  THE  TOTAL  NUMBER  OF  TIMES 
C  THE  LOOP  NILL  HE  EXECUTED.  ZERO  HOLDS 
C  THE  VALUE  0. 

SUBROUTINE  rflLELOUP 

TMTF.GER  COUNT ,  ZERO 

COMMON  / HCS/  COUNT,  ZERO 

')  )  NHTLE  (COUNT  .  G  T  .  ZERO) 

COUNT  =  COUNT  -  1 


END  DO 


END  1  OF  A I L  EL OOP 


:  THIS  TS  THE  SUM  ALGORITHM  IN  FORTRAN 
:  'COUNT*  IS  THE  NJM3ER  OF  TIMES  THE  SUMMATION  WILL 
:  BF  COMMUTED.  'VALJE*  IS  THE  NUMBER  TO  3E  SUMMED. 

C  'TEMP*  IS  A  STORAGE  LOCATION  FOR  'VALUE'.  'TOTAL* 
C  IS  THE  VALJE  OF  THE  SUMMATION.  'ZERO'  HOLDS  THE 
C  VALUE  0. 

SUBROUTINE  SUM 

INTEGER  TOTAL,  VALUE,  TEMP,  COUNI,  ZERO 
COMMON  /ft  CS/  TOTAL,  VALJE,  TEMP,  COUNT,  ZERO 

ZERO  =  0 

DO  a'HTLE  (COUNT  .  G  T  .  ZERO) 

:  REINITIALIZE  the  VAR  TABLES  FOR  THE  SJM  ROUTINE 

COUNT  =  CU.JMT  -  1 
VALUE  =  TEMP 
TOTAL  =  ZERO 

C  THIS  13  THE  ACTUAL  SUMMING  OF  ME  VALUE 

TO  aiUILE  (VALUE  ,GT.  ZERO) 

TOTAL  =  TOTAL  *  VAL'<t 
VALUE  =  VALUE  -  1 

E  N  0  00 

e  id  or 


E  10  1  OF  SUV 


\  SUMMATION  ALGORITHM  l ISi  JRS  HLL  \ 


PROGRAM  Sl.M'IATn.i; 

INTEGER  TOTAL,  VALUE,  TE*P»  COUVT,  ZERO? 

00  '•/NILE  (COUMT  .Gf.  ZERO)? 

COUNT  =  COJNT  -  1? 

VALUE  =  TE^P? 

TOTAL  =  0? 

DO  while  (VALUE  ,GT.  ZERO)? 

TOTAL  =  TOIAL  ♦  VALUE? 

VALUE  =  VALUE  -  I? 

ENDDO; 

END 00? 

S  T0°  ? 

EM). 


I 


8  7 


o  o  o  o  n  c> 


THE  FACTORIAL  SUBROUTINE  IN  FORTAN 
♦COUNT*  DETERMINES  HOW  MAN*  TIMES  THE  FACTORIAL 
OF  'VALUE*  RILL  3E  DETERMINED.  'TOTAL'  HOLDS 
THE  ANSWER  AMO  IS  INITIALIZED  TO  1.  'TEMP* 
HOLDS  THE  FACTORIAL  value  TO  BE  DETERMINED 
FOR  REUSE. 

SUBROUTINE  FAC 

INTEGER  TOTAL,  VALUE,  TEMP,  COUNT,  7ERO,  ONE 
COMMON  /AlCS/  TOTAL,  VALUE,  TEMP,  COUNT,  ZERO 

DO  AiHTLE  (COUNT  .GT.  7ERQ ) 

COUNT  =  COUNT  -  1 
VALUE  =  TEMP 
TOTAL  =  ONE 


DO  WHILE  (value  .GT.  ZERO) 
I9TAL  =  TOTAL  *  VALUE 
VALUE  =  VALUE  -  l 

END  DO 

EnD  DO 


E  id  1  OF  FAC 


\  THE  FACTORIAL  PROGRAM  in  jrs  hll  \ 


PROGRAM  FACTORIAL? 

I U T EGER  TOTAL,  VALUE,  TEMP,  COU\JT, 

DO  a/HILE  CCOUMT  ,GT.  ZERO)? 

COJMT  =  COUNT  -  1? 

VALUE  =  TEMP; 

TOTAL  =  1? 

aij  airflLE  (VALUE  .GT.  ZERO)? 
TOTAL  =  TOTAL  *  VALUE? 

VALUE  =  VALUE  -  1? 

ENOCH? 

END DO? 

S  TO 3  ? 

E'JD. 


uq 


ZERO,  I? 


APPENDIX  B 


FLOAT  IMS  POINT  MATHEMATICS  ALGORITHMS 


SUBROUTINE  FFT 


A 

FAST  FOURIER 

******** 


******* 

TRANSFORM 


************* 


*************** 


X  -  COMPLEX  ARRAY  *(2**M) 

M  -  ORDER  OF  FFT,  '1  =  2** -1 


BASED  UPON  A  \!  FFT  FIRST  HEVELOPF.O  BY  SIGNALS 
SCIENCE  CORPORA! Ion  FOR  PROJECT  SALESCLERK. 

-IRST  TR  AN  SC  R  t  It  0  BY  LCOR  C  LAJRv/ICa*  US\ 
MODIFIED  BY  Lt  HAPTO'lG,  US  B 


REAL  XKEAL  ('40R6)*XiMAGC4O96)*T»tFAL,  I  J  v  A  0  *  T  2  R  H  A  L  * 
l  T  21  MAG*  J'RK  AL  *  U  l  V,AG 

O  A  f  A  PI/5.  I'JtVRPbS/ 

V  s  2  *  *  M 


M  STAGE  FOURIER  TRANSFORM 


DJ  ?0  L=t»M 
LF  «>  =  ?**  (  **  I  -l  ) 

LttsLFO/2 
ijRF.AL  =  t.O 
!J  T  M  A  F,  s  o.o 

pm  a  st  =  p  i  /pl:ia  men 

/•=c  mplx  (C)S(PBASF  )  *-Sl  4  {  P  H  A  5  E  )  ) 

IF  (PHASE  .  G  T  .  (  n  l  /  2 . 0  )  )  Mf\, 

J  H  A  S -  2  =  P  T  -  3MASF 

EISF. 

5H'i3E?  =  PHASE 

E"l>  [P 

CI.BX  =  O.RRRJS/RS  -  (  n  .  JQOP  40US  *  P  M\ b  P  2  *  Po'bEP) 
SOS*  =  CDS <  t  40 . 0 S  R *■> 2  h  7  4  *  0  4  A  S  t  2  *  PHASE?  * 
i  Phaser  *  ph«sf2) 

IK  (  PHASE  ,:,r.  ( P I  /  ?  .  n  )  )  I  -IF  I  cosx  =  -COSX 


A L C O l.  A  |  £  si  ! 


IF  (PH ASE  .LT.  (PI/2.0)  )  THEM 
PHASE2  =  3I/2.0  -  PHASE 
ELSE 

PHASE2  =  PI  -  (3.0  *  PT/2.0  -PHASE) 

EMOIF 

SI'-JX  =  0.99995795  -  (  0  .  a9Q2'40 '45  *PHASE2  *  PHASE?) 
S  l  MX  =  SIM  ♦  (0.03962b7'4  *  PHASE?  *  pHASE2  * 

1  PHASE2  *  PHASE?) 

DEC  l  MAT  I  OM  I  :m  TIME 

DO  ?0  J= 1 , LE 1 
DO  10  [=J,M,LE0 
IP=I fLfcl 

T  w  E  A  L  =  xREAl(T)  +  X-?EAL(Ip) 

T  I  m A (i  =  X  I  MA  *»  (  f  )  +  X  I  M  A  G  (  IP) 

T2PEAL  =  XPEAL(l)  -  *9EAI_(TP) 

T  2 1  w  A G  s  X  [  M  A G  (  l  )  -  X  (  m  AG  (  T  P ) 

XKtAl.(IP)  =  I  2 P E  A L  *  iJ P E  A L  -  T  ?  I  !  A  G  *  '.Jl-’Af; 

X  I  MAG  (  I P )  =  TPHE^L  x  UfMAG  I2IMAG  *  .ipFAl. 

xREAL(l)  =  TPEAL 
X  I  M  A  G  (  (  )  =  T  I M  «  u 

COM  [  \  IE 

JrtEAL  =  -IKEAL  *  COSX  -  (,U‘'ag  *  -ST  i  X ) 

JIM  AG  =  fUPE  AL  *  -SI  M  X 1  f  JI  -’AG  *  COSX 


PROGRAM  FFT; 


****** i 

FAST  FOURIER  TRANSFORM  -  AMGS  HLL  VERSION 

*  *  *  *  i 


INTEGER  I ,  J,M,i\|,L,LEO,LEI  ,  IP»*0  JNT,K,,Nv2,NlM,('m  ,P 

REAL  t'RF.AL,UI'U3,PHASE,C0SX,SINX,  TREAL,  UMAG, 

1  tpreal, T?IMAG; 

REAL  TMP,PI,R|,R2,R3,K3; 

REAL  ARRAY  XREAL('JORb)  ,  X  IMAG  (  aORb)  ; 


\  DOES  NO T  00  R I T  REVERSAL 

\  \l  =  P  *  *  M 


n  =  t ; 

O')  <Ol‘M  f  =  1  TO  M; 

"J  =  2  *  u; 

E'lO’)i); 

\  «'  stage  FO.IRTLR  TRANSFORM 

\  EXECUTE  THE  LOOP  1  n  U^ES  FOR  TI  U ')G  PURPOSES 


do  <  =  i  ru  to? 

1)0  L=  1  TO  A ; 

\  REPLACE  *•  l  T  H  I'JLTimE  F  XPA-iSTO  '<  OJF  TJ  oh  EXPONENTS 
\  LE<T  =  2»*  (  ‘If  1  -L  )  ;  \ 

LEO  =  1 ; 

.50  mij'iT  =  1  TO  (V*l-L); 

leo  =  ?*ue.t; 

coooo; 

L  E l=LE0/2; 

u  i  .v.  a(,  =  o .  o ; 

'JRf-  A|_  r  |.0; 

poASt=  tt/plo-at  (i  e  i ) ; 

\  x  =  CMPL  *  (COST  p  TASF  )  ,  -ST  O(  P'JASfc  )  )  \ 

\  GE  lE  R  b  T  F  ST  !  A*,n  COS  \ 


IF  (PHASE  .GT.  (P[/?#0))  THE" 


mp  =  pi  -  phase; 

ENDDO 

ELSE 

•»; 

T -  PHASE; 
EM.)O0; 


COSX  s  R1  -  (P2  *  I  vlP  *  MP  ) 

1  (23  «  MP  *  MP  *  IMP  *  I  viP); 

IF  ( °  H  A  S  E  .GT.  (PI/2.0))  MEM 

ou; 

C93X  =  -:usx; 

FMon; 

\  CALCULATE  SIM  \ 

IP  (  PHASE  .LI.  (PI/2.0)  )  THEM 

po; 

I'/p  =  PI/2.0  -  phase? 

EMPOO 

ELSE 

nn; 

MP  =  P[  .  (  K3  *  P I / 2 . -  PHASE) 

=  \iodo; 

SI  JX  :  III  •  M2  *  T HP  *  T  ip  ) 

1  (PS  *  T'in  *  Tvia  *  MP  *  MP)  ; 


OP  CM' AT  I  T  J  I  m  T  I  V'E 

01  J= 1  TO  LEI? 

n  )  I  =  ,?  1  1  j  hY  LEO; 

i p=  r  tuE i  ; 

T  Pt  A  L  =  MfALU)  +  xPEAL(IP); 
I  I  "AG  =  <  I  •'  A  G  (  t  }  ♦  <  I  '  a  G  f  I  P  1  ; 

I  ’PEA.  =  X  w  E  A  L ( I  )  -  X  p  E  A l ( I p ) 
I  PI  u,\~  T  xM'AGUj  -  XM.V;,([P) 

x  HE  A l  (  IP)  =  (  I  2  HE  A  L  *  .HE  '  L  ) 

1  (  r  P  M  a  ♦  *  I  ’AS); 

XMAii(lP)  =  (  1 2  P  E  A  L  *  JMAG) 

i  r  f  2 1'i  a  ft  owe  alt; 

x  peal  (U  =  meal; 
xmag(  i )  =  rr*AG; 


J  J*  1  ■  *  \ 


THIS  IS  THE  FORTRAN  CHEBVSHE V  COSINE  ROUTINE. 

THE  COSINE  OF  ALL  INTEGER  ANGLES  FROM  0  TO 
ISO  DEGREES  TS  COMPUTED.  THE  COSINE  OF  EACH 
ANGLE  IS  COMPUTED  'K*  TI^ES  FOR  TIDING 
PURPOSES. 

SUBROUTINE  COSINE 

INTEGER  I,J,K,L,N‘,N 

REAL  PI,  TE'MP,R1,R2,  R3,  LIMIT,  FANS 

common  /»CS/  I ,  .1 ,  k  ,l  , v1,  M,  PI ,  TE M3,  R1  ,  R?, R5, 
1  LIMIT ,F ANSI  1 : 1  SO  ) 

DO  WHILE  (J  .LE.  *) 


I  =  0 


DO  vHILE  II  .LE.  1801 


IE  (I 

.LE.  001 

THEN 

TF.'-V  = 

( ( l *PI) /LI  "[ r> 

ELSE 

TEMP  = 

i  ( (  n  -  n  *  p  n  /  l  i  ui) 

c  NO  IF 

F  A  'i  3  C  I  )  =  R1-(RP*TEMP*TEMP)  ♦ 

1  ( R  3  ♦  I  E  m  3 «  T  F  •'  P  *  I  E  m  >>  *  I  E '  ■  P  ) 

[  F  (I  .31.  0  0  )  F  A  i  S  (  (  )  =  F  A  >i  S  (1  J  *  (  -  '  ) 

1  =  1  +  1 

End  >0 

i=l  +  1 


C  \i) 


i:  j 


t-  'D 


1  IF  -1VC  0  S  I  H 


MICROCOPY  RESOLUTION  TEST  CHART 

MTIOKU.  «U«»U  W  ST»NO»»0» ->»»>-* 


\  THIS  SJBROITINE  IS  WRITTEN  IN  JRS  HLL  AND  CALCULATES  ThE 
COSINE  OF  HE  ANSLES  FRO*  L  TO  *  DEGREES.  THE  DFGREES 
ARE  FIRST  CONVERTED  TO  RADIANS  AND  THEN  THE  CHEwHTSHEV 
APPROXIMATION  IS  USED  FIND  THE  VALJE  OF  THE  COSINE. 

THE  LOOP  IS  EXEC  JT  ED  K  TT*£S  TO  ALLO*  FOR  THING  OF  THE 
3ROCEDJRE.  \ 

**R OGR A H  COSINE; 

INTEGER  I,J,K,L,M»N? 

REAL  PI#TE*P»R1 »R?#R3»FACTOR* 

REAL  ARRAY  HANS ( 1  SO ) * 

30  J  *  l  ID  <;  \  TUTTP  TO  CONTROL  ThE  NUMBER  OF  I  HE'S 

THE  COSINES  ARE  CALCULATE^  \ 

DO  I  s  L  TO  M;  V  LOOP  7T  CONTROL  '/HAT  ANGLES 
TO  CALCULATE  THE  COSINE  FOR  \ 

IF  (I  .uE.  RO)  THEN 

TEMP  =  ((  FlOAT(l)  *  PI  ) /FAC  I  OP) 


ELSE 

TF«P  s  C (FLOAT ( V-l )  *  PT) /FACTOR); 

HANS(T)  =  Rt  -  (k?*  TE^'P*  1  Fvp ,  + 

( fit  TEMP*  TF  Rf'*TE"P*  Tfc  MP)  ; 

\  CORRECT  THE  SIGN  OF  THE  A.S-/EP  \ 

TF  (I  . 3 T .  on)  THEN 

hao$( n  =  (-  oa'S( in; 


F  imp ; 


'  %■*  .ay-  r#g^^*.e.^.«^,.ito»JJfc^vfW*-»i^J**»^-»>-.J|-«-  *•*■  _rw >.<  *w  . 


APPENDIX  C 

SOKU  VG/SEARCHIMG  ALGORITHMS 


C  THIS  SUBROUTINE  LOOKS  FOR  EACH  ITEM  I<j  THE 
C  ARRAY  'KEYS’  STARTING  alTH  THE  FIRST  ITE«  AND 
C  ENDING  OP  YITH  THE  LAST  I T E M .  EACH  THE  AN 
C  ITEM  IS  FOUND  THE  INDEX  OF  THE  DESIRED  ITEM 

c  is  compared  with  he  index  of  the  found  item 

C  TO  INSURE  THAT  THE  CORRECT  ITEM  /.'AS  found. 

C  [F  AN  IMPROPER  ITEM  IS  FOUND  THEN  THE  COUNT 
C  OF  FRROKS  IS  INCREMENTED  0 Y  ONF. 

w 

SUBROUTINE  BINARY  SEARCH 

INTEGER  KOUNT , RESULT, S I /El , KEYS, K, UPPER, LOWER, 1  ,  J, 
t  ERRORS, F 

COMMON  /,iCH/  KEYS(  tUOOO),KUUNT, RESULT,  S  I  ZE  1  ,  <  iP»E  R  , 

1  LOwER,  l ,  I,F, ERRORS, « 

C  LOOP  THkOUGH  EVER  ELEMENT  OF  THE  ARRAY 
C  AND  LOOK  FOR  EACH  ELEMENT  UnCE. 

00  J  =  t,Sl/Et 

:  INI  T  IALT7E  rut  CONSTANTS  AND  VARIABLES 

< NUN  f  =  0 
RESULT  s  K  t  Y  S  T  J  ) 

J°P?R  =  SIZE  l 

L  )p.>W  =  1 

e  -  .FAcSF. 

IF  f  RE 5  IL  T  .LT.  *  E  Y  S  f  L  0  w  E  R  )  )  THFN 
R  t  Y  t  j  R :  4 

ELSE  TF  (  RF  S'-  'L  T  .GT.  ►  E  Y  S  C '  iPOF  R  )  )  lnt  . 

R t  T  UR  >i 

ELSE 

DO  WHILE  (F  .  F  O  .  . e  AL  SE  .  ) 

I  s  (  JPPFR  f  LJ-ER  ♦  1)/? 

I F (  RE  SUL  I  .lT.  «EYS(n  )  T  n  F  N 

I OPF-i  s  [  -  | 

ELSE  IF  (  RESULT  .  G  T  .  <EVS(I0  )  TnF  4 
1.0/  E3  =  IH 

ELSE  IF  (  RESULT  ,h'».  KE.YMT)  )  1  HE  l 
E  =  .TRUE. 

FL^F 

RESULT  =  -S 


ENDIF 

IF  (  UPPER  .LT.  LOWER  )  THEN 
F  s  .TRUE. 

ELSE 

KOUNT  s  KOUHTfl 

ENDIF 
END  DD 
END  IF 

IF  (I  .\IE.  J)  ERRORS  =  ERRORS  ♦  l 

EM 0  DO 

END  !  OF  81'lARy  SEARCH  SUBROJllME 


\  BINARY  SEARCH  PROGRAM  *kRI  TTEN  IN  JRS  HLL  \ 

\  rfHEN  HE  RESULT  IS  ASSIGNED  A  NEGATIVE  \ 

\  VALUE#  THERE  IS  AN  ERROR  IN  THE  RESULTS  \ 

PROGRAM  8SEARCHJ 

INTEGER  ARRAY  KEYSUOOOO); 

MTEGER  KOUMT  ,R£SUL  T#  SIZE  1  #UPPER,  LOVER# 

T,J#FLA3, ERRORS, K; 

Di)  <  s  l  TO  SIZEU 

KOUNT  =  0; 

RESULT  =  KfcYS(K)? 

UPPER  =  SIZEl; 

LOftER  =  I? 

flag  =  o; 

IF  (  RESULT  .LT.  KEYS(LOvER)  )  THEN 
RF.  SUl  Tr  -1 

ELSE 

TF  (  RESULT  .GT.  K  E  1 S  C  t  iPPER  )  )  Inf 
RF.SULTs  -2 

ELSE 

01  *'H T L E  (FLAG  .EG.  0); 

I  s  i  UPPER  *  L 0 E M  f  n/p; 

IF  (  RES  JLT  .LT.  KEYSU)  )  HE  i 
UPPER  =  I-t 

ELSE 

IF  (RESULT  .GT.  \EYS(I>)  ME* 

L  )<“ER  =  T  *  1 

ELSE 

IF  (RESULT  .tO.  *EYS(  Id  HE* 
FLAG  =  1 
ELSE 
00  J 

RESULT  s  -s; 

CL A G  si; 

E  JOn; 

IF  f  |F°ER  .IT.  L'^FR  )  Ht 
FLAG  =  1 

El  SF 

OU  1 1  =  vnvT  ♦  1  ; 

F"  i>ru ; 

if  (*  .it.  n  Ht'i 

F UK  IPS  =  FRPjRS  *  i; 

E  'U  iu? 

STOP; 


R9 


E  .O 


Cl  Cl  O  CJ  Cl 


THIS  IS  THE  QUICK  SORT  ALGORITHM  IN  FORTRAN 
•A'  IS  THE  ARRAY  HOLDING  THE  ITEHS  TO  BE  SORTED 
ALL  INTEGERS  IN  *A'  ARE  GENERATED  BY  THE  HARNESS 
PROGRAM 

SUBROUTINE  SORT 

INTEGER  I,M, J,P, T,U,K,Q1 ,X,N,LT,UT,A 
COMMON  /HZ S/  I,M,J,P,  r»3»«#aitX,ai,LTC14)*lJT(l4) 
.  1  A(SOOOO) 

c  initialise  the  variables  am  constants 
j  =  H 

i  =  i 
M  s  1 

300  IF  (J  -  l  .ST.  1)  THEN 
°  =  (J  ♦  I )✓? 

T  =  A(0) 

A(R)  =  A  ( I ) 

3  a  J 

DO  300  K  a  T  *  t'tt 

IF  (ATrO  .GT.  T)  HEN 
00  201  01  =  Q,<,-1 

IP  ( A  f  Q 1  )  ,LT.  T)  THEN 
*  =  MK) 

A(*l  S  Af!Jt  ) 

A  (  N  t  )  s  X 
3  =  Ql  -  I 
GUO  tao 

F.NOIF 

?0l  CON  UN. IF 

1  =  K  -  1 
GOTO  no 

1?0  F  ''DIF 

500  CONTINUE 

no  4(1)  S  HQ) 

U(0)  =  T 

IF  (?*'.!  ,f,T.  UJ)  THEN 
LIT')  =  I 
0  T  (  * )  =  a  -  1 
I  -  j  ♦  1 

ELSE 

LUO  =  i)  ♦  1 

«JT(v«)  s  J 

J  s  J  -  l 
E  IN  o  I  F 

v  2  V  t  t 


SOTO  20 0 
ELSE 

IF  (T  .GE.  J)  THEN 
GOTO  160 
ELSE 

IF  ( A ( I )  .GT.  A  (  J  1  )  THEM 
X  s  A  (  I ) 

A  (  I  )  =  A  (  J  ) 

UJ)  =  X 
EMDTF 
M  s  M  •  1 

IF  ( M  .  ST .  0)  T4EM 
I  =  LT(^) 

J  =  JT(M) 

GOTO  200 
EMOIF 
ENOIF 
E'iDTF 

E'-Jl)  I  OF  S0=>r 


3ROGRAM  SORT? 


\  THIS  PROGRAM  SORTS  THE  ELEMENTS  OF  AM  ARRAY  INTO 

ASCENDING  ORDER •  THE  METHOD  USED  TS  THE  "QU I CKERSOR  T " 
ALGORITHM  OF  R.S.  SCnwEN,  ALGORITHM  ^71,  C*CM,  VOL  . 

8 ,  NUMSER  11,  OCTOBER  19S5.  THIS  VERSION  aAS  COPIED 
*ROM  THE  JRS  HLL  MANUAL  FOR  THE  AM£S  SYSTEM. 

THE  ALGORITHM  aURKS  RY  CONTINUALLY  SPLITTING  THE  ARRAY 
TNTO  PARTS  SUCH  THAT  ALL  ELEMENTS  OF  ONE  PART  ARE  LESS 
THAN  ALL  ELEMENTS  OF  THE  OTHER,  *ITi  A  THIRD  PAR T 
IN  THE  MIDDLE  CONSISTING  OF  A  SINGLE  ELEMENT. 

THE  ARRAY  TO  UE  SORTED  IS  3RF.-SET  IN  *A'  AMD  THE  N  J  3  E  R 
OF  ELEMENTS  IN  THE  ARRAY  IS  SET  IN  *N*.  ON  F X t T ,  THE 
ELEMENTS  OF  ARRAY  'A*  ARE  SORTED.  \ 


integer  i,  ’,  j,p,t,  j,<, oi,x,n; 

INTEGER  ARRAY  LTTI  4) ,UT( 1  4) , A(SOOOO) ; 


J  =  n  ; 

[  S  l; 

m  s  l ; 

ion:  IP  (J-l.Gi.i)  THEM 

m; 

p  =  (JhI )/2; 

T  =  AfP); 

ATP)  s  ATI); 

')  =  J; 

DO  <=[♦!  TO  j; 

IF  TA(K).GT.T)  THEM 

o  »; 

DO  OIsd  DMAaITO  k; 

lc  (AfUI).LI.n  T  k,t ' 
0  i ; 

x  =  a(a); 

a  (  k  )  S  %  (  01); 
a (ui  )  =  x ; 
o  =  41-1 ; 

G.ili  120; 

E  '*  o  '*  1 1 ; 

EMu  )u; 

•J  =  k  -  j ; 

GOTO  1  on; 

.fc  iddo; 

ip;»:  continue; 

E  'll) DO  J 


Am  S  A  ( 

A ( o )  s  t; 

IF  (2*Q  .GT.  t ♦ J)  THEM 

do; 

L I ( y )  =  i; 

UT(M)  =  0-1? 

i  =  gh; 

ENDOO 

ELSE 

DO? 

LT  C  *)  =  QM ; 
urcv»)  s  j; 
j  =  (i-i ; 

ENDOO; 
m  =  ^  ♦  1 ; 

GOTO  100; 

E'.'DDO 

ELSE  IF  (l.QE.J)  THE'*  GOTO  IbO 
tL  3t 

no; 

IF  (A(l).ST.A(J))  HEM 
on; 

<  =  A  ( T  ) ; 

a  ( I )  =  ft  ( J ) ; 

u  n  =  *i 

EN.om); 


v  -  V- 1 ; 

I P  H.GT.O)  THEM 
bu; 

T  =  L  T  ( -i )  ; 

=  u  r  ( * ) ; 
goto  too? 

C 00 no; 

F.  •■'0 no; 


i  >  <-»  ci  n  n  n  ri 


SIEVE  PROGRAM  IN  FORTRAN  IV 
COPIED  FROM  BYTE  MAGAZINE#  JAN  83 
THE  SIEVE  OF  ERATOSTHENES  ALGORITHM  IDENTIFIES 
THE  PRIME  NUMBERS  FROM  3  10  N .  IN  THIS  CASE 
M  =  1 6#  38 1 .  HE  PRIMES  ARE  STORED  IN 
AN  APRAY  NAMEO  'PRIMES*  FOR  VERIFICATION 
OF  THE  ALGORITHM  IN  THE  HARNESS  PROGRAM 

SUBROUTINE  SIEVESUB 

I  ITFGER  l,J,K,COUMT,  I  TER#  PRIME#  'J#  PRIMES 
LOGICAL  FLAGS 

COMMON  /F/  FL AGS ( 8 1 9 1  ) 

common  /store/  T , J,K ,cu  INT , ITER# °RIME,N,PRI MKSC 1 ROO ) 

1)0  192  ITER  =  1,20 
COUNT  =  0 

N  =  1 

50  Ito  I  2  1 »  H 1 9 1 
It)  FLAGS!  D  =  .  HOF  . 

00  101  T  s  1,8191 

I F  (.NO!.  FLAGS ( I ) )  GOTO  191 
PRIME  s  I  t  l  ♦  1 
HR  f  ^E S  (  J)  2  py  1  -IE 
M  2  U  ♦  1 
COUNT  2  COUNT  ♦  1 
k  2  I  ♦  PRIME 
IF  (*  .GT.  ”191)  GOTO  HI 
O0  1 6 ()  J  2  k,  Ml9f,  HR  T  VE 
l fa  9  FLibS(J)  2  .FALSE. 

HI  Cl.HTTN.JE 

19?  CONTINUF 


END  !  OF  S 1 E  VE S  JM 


\  HLL  VERSION  OF  THE  SIEVE  OF  ERATOSTHENES 

THE  PROGRAM  IDENTIFIES  THE  PRIME  NJMBERS  BETWEEN 
1  AND  '4 .  \ 

=>R0GRAM  SIEVE; 


INTEGER  I,  J,K,COUVT,L, PRIME,  ZERO,*,  TEN; 
INTEGER  ARRAV  FL4GS(H191),  PR I MES ( 1 900 ) ; 
U'O  L  :  I  TO  TEN; 

COUNT  =  o; 

I  =  i ; 

00  I  S  1  TO  *? 

FLAGS! I )  =  t; 

enddo; 

)0  1  =  1  TO  m; 

T c  (FLA33([)  .E).  1)  THEN 

on; 

PRIME  s  T  f  i  +  i; 
PRTMfS(J)  =  PRImf; 

J  =  J  f  \i 

rujiJT  =  con.-ir  ♦  m 

*  s  F  f  PRIME? 

do  ph ilf  (h  .le.  m); 

F  L  A  3  S  (  k  )  =  0 ; 

K  =  K  t  PRIME; 

E NOD o ; 

E'-’OUU? 

Efi  'i>'» ; 

t 


C  3USBLE  SON  I  IN  FORTRAN 

C  THE  INTEGERS  IN  ARRAY  *A'  ARE  SORTED  INTO 
C  ASCENDING  ORDER  3Y  CONTINUALLY  MOVING  THE 
C  'NEXT*  LARGEST  ITEM  TO  ITS  PROPE®  POSITION 
C  IN  THE  ORDERING.  THE  ALGORITHM  IS  IMPROVED 
~  3Y  CHECKING  EACH  TIME  THROJGH  THE  SORT  TO 
C  SEE  IF  ANT  EXCHANGES  HAVE  BEEN  MADE .  IF 
Z  'JUNE  ARE  MADE  THEN  THE  PROGRAM  TERMINATES. 

SUBROUTINE  BUMBLE 

integer  i,m,xchang,temp,a 

CUVVQVI  /^CS/  T,N,  <CwAfJG,  I EMP,  A  fl  0..)0«  ) 

XCHAUG  =  .TRUE. 

Oil  nhileixchang  .EO.  .TRUE.) 

X  CHANG  =  .false. 

N  =  il—  l 
DO  I  =  \," 

if  (Am  .L-r.  Afi  +  m  the 

TEMP  s  A  (  1  ) 

A  (  I  )  s  A  (  I  ♦  1  ) 

A(I  +  n  =  TE  'P 
XCHANG  =  .  I  c UE . 


t  'OIF 


o  n  n  n  n 


APPEMDI X  0 


BIT  MANIPULATION  algorithms 

SIT  MANIPULATION  PROGRAM  I  \l  FORTRAN 
ARRAY  ‘A*  HOLDS  THE  PREGENERATED  VALUES  TO 
BE  MANIPULATED.  *  n*  holds  the  NUMBER  of 
IIMES  THE  MANIPULATION  NlLL  OCCJR  FOR  TIDING 
PURPOSES. 

SUBROUTINE  BITMAhIP 
INTEGER  I, N, HOT, A 
COMMON  / NCS /  I,N,WOT, A(IOOOOO) 

DO  '400  I  =  1,M 

MI)  =  HAN0(A(1),A(1)) 

A  (  I  )  =  J  'JHT  f  A  (  I )  ) 

A  (  I  )  s  J'ii)KA(n) 

ATI)  s  JIABB(A(I)) 

A(I)  =  .)  I H  I  TS(A(  I  )  j  0  *  ?  £  ) 

All)  =  1A  'Of  A(  I  )  ,  .*,(  1)  ) 

ATI)  =  l JP( A ( T ) , A(  l  )  ) 

A ( I )  =  liSHFTr(A(I),3?,t?) 

'10  0  E  40  DO 

t  ID  i  OF  w  I  T'-'A  1 1  d 


■  /;  > 


\  31 TMANIPJLATION  OR3GRA*  IN  JRS  HLL .  ARRAY  'A* 

MOLDS  THE  VALUES  TO  AE  MANIPULATED.  N  IS  THE 
TOTAL  NUMBER  OF  IIMES  THE  ITEMS  MILL  BE 
MANIPULATED.  • I •  IS  THE  LOOPING  VARIABLE  •  \ 

PROGRAM  BITMANI3; 

INTEGER  I,N,ROT; 

INTEGER  ARRAY  A(tOOOOO); 

i)0  I  s  1  10  N; 

A(n  =  All)  .AMO.  All); 

A  ( I )  =  AU>  .XU?.  (MASKf31,0)); 

All)  =  A ( I )  .  X  0  R  .  (MA5H (31 ,0)  )  ; 

A(l)  =  AOS ( A ( I  )  )  » 

A  f I )  =  SRI ( ( A ( I  )  .AND.  (MAS*(31, 0))), 0): 

At n  =  aid  .a no.  aid; 

A( I  1  =  A ( f )  .UR.  A (  I )  ; 

Ad)  s  RLL(A(  I  )  ,3?)  i 

E'VQ  00? 


SUBROUTINE  91TREV 


C 


u 

C 


u 

c 

** 

u 

c 


X  -  COMPLEX  ARRAY  X(2**M) 

M  -  NUMBER  OF  POINTS 

BASfcD  UPON  AN  FFT  FIRST  DEVELOPED  BY  SIGNALS 
SCIENCE  CORPORATION  FQR  PROJECT  SALESCLERK. 


c  FIRST  TRANSCRIBED  BY  LCDR  C  LAURViCK,  iJS'l 

C  MODIFIED  BY  LI  M  HARTONG,  JSM 


************************ 


COMPLEX  X(aORb),T 
INTEGER  N,  W?,NWl  ,  m t  J,K 

REARRANGE  ARRAY-  BIT  REVERSAL 


'!  =  ?**« 

NV2*N/2 
N  *  *  1  s  \  -  1 
J  =  1 

00  50  1st , N  *  1 

l F ( I . 3E  .  J )  GO  TO  25 
I =X( J  l 
x( J)=x(t) 

X ( [1ST 

25  <suv2 

2b  IF(*.3E.J)  GO  I*  50 

J=J-K 

K=K/? 

G  1  TO  2b 

50  Js,J*R 

»F  I  .IK  N 
t  iD 

I 


f 


l  I  0 


PROGRAM  9ITREV? 


=5 1  r  REVERSAL  FOR  F*T  -  AMGS  HLL  VERSION 


3ASE  cPON  AN  FFT  FIRST  DEVELOPFD  BY  SIGNALS 
SCIENCE  CORP.  FOR  PROJECT  SALFSCLERK. 
TRANSLATED  INTO  HLL  FOR  THE  AlCS  BY  LT  'A  HARTONG 


INTEGER  1,  J,Ni,N,L,LEO,L£t  ,  TP,<0  m,K, 

NV2 #  NM, NMi » P  J 

REAL  IREAL, JI 3# PHASE , COSX , S TNX , T WE AL , T I «AO, 
T2REAL, T21MAG,TMP,PI ,R1 ,R2,R3,K$; 

REAL  ARRAY  XRE  AL  ( 'JORto  )  ,  X  I «  AG  ( '1 0  Of) )  ; 

V  REPEAT  3u  TIMES  FOR  TIMING  RURPOSFS  \ 

D)  L  =  1  Tu  30; 


\  m  =  2  *  *  M 


\ 


n  =  l ; 

DO  <0  INI  =  1  TO  -1; 

iv  =  f  t  *  2; 

ENDDO; 

\  initialize  the  CONSTANTS  \ 

NY2  2  N/2; 

-jv'l  q  M  -  \; 

i  =  i; 

\  REA  TRANGT  A'^KAY-  HI  T  R  F  Y  £  R  S  A  L  \ 


>M  1=  \  TO  N'-'t  ; 

IF  (l.SF.J)  HF1  roll.)  2S; 
T  R  E  A  I.  =  >RFAL(I); 

II mag  s  xi'A^j); 

*  RE  AL (  I)  =  X»E  *LT  l); 

x  I  m  a  (;  (  j  j  s  xlMAGTT); 

XR-AI  ( I  )  s  irfal; 
x l VAGI  I)  =  TJ*AG; 
2S:*=nv2; 


?b: IF  CK.GF.n  THEN  GOTO  30J 

JrJ-K; 

<  =  <  /  2 ; 


I  I  1 


UtJ  UMU 


APPENDIX  e 


SAMPLE  HARNESS  SETUP 


PH03KAM  FACTORIAL 

THIS  IS  the  factorial  program,  fortran  version 

AN  I  NT  EGER  IS  READ  AMD  THE  FACTORIAL  OF  THAT 
INTEGER  IS  PRIMTEO.  THE  FACTORIAL  OF  THAT 
number  is  calculated  ino,ooo  times  before  being 

PRINTED  F  JR  TIMING  PJRD()SES. 

imiegfr  total,  value,  ie^r,  count,  zero, 

I  TIMER,  HANDLE ,  1  RE  T  ,  TNST 

I  or  EGER  TIMES,  :,  V,  T,  A\1S,  ONE 

COM«OM  /<*CS/  TOTAL,  VALUE,  TE'^P,  COUNT,  ZFR 0,  Oyf- 

10  FORMAT (/,•  ENTER  AO  INTEGER  *Ff«EEM  1  A NO  I?’,/) 

20  FORMAT f  THE  FACTORIAL  nF  '  13  ',  IR) 

*0  FORVAK/,'  FACTOkTAl  USING  FORTRAN  SUBROUTINE  i'IH 
1  CUImmOM’/) 

40  FORMAT!/,  •  FACTORIAL  USING  IRS  HLL  -'/11H  COMMum'/J 
50  FORMAK/,'  FACTORIAL  USING  STRAIGHT  FORTRAN  CODE 
1  NllH  COMMON'/) 

30  FORMAT  (/,  '  C.P  )  ri'F  =  '  , FS . ? ,  '  SECONDS'/) 

70  Format ( 12) 

°0  FORMAT!/, •  FACT  TRIAL  USING  SI-MIGHT  FORTRAN 
1  WTTHOUI  COMMON'/) 

C  Rfc*0  Ir/HA  I  F  A  f  f  r  A  L  T(  0  F  I  F  (?  -4  f  *•»  £ 

J°I  Th  (6,10) 

REA0(5,70>  IF-ip 

:  initialize  the  consiants 
r i mp  s  =  i no no o 

cuuvi  =  iimfs  i  n f  1 1 v £ s  r  »  execute  lo<t 

ZERO  =  0 

IPML  =  1 

r-F.  =  i 

C  =  I  I  MF  S 

I  =  lcv,lJ 


C  IHlS  BAR  I  IS  STRAIGHT  FORTRAN  vIlHTUT  CO  1  MU J 


.vRITE«6,S0) 


IF  (.NOT.  L!RSI\lirTIMER(HANDLE))  CALL  ERR 
DO  aihILE  (C  •  ST •  0) 

C  =  C  -  1 

V  =  T 

AMS  =  1 

00  rfHILE  (V  .GT.  ft) 

ANS  =  AMS  *  V 
V  =  V  -  1 

F'iD  00 

EMO  no 

IF  (.<01.  LIRSSTA  n  I  II  ^ER  ,  HArJDLE  )  )  CALL  t?R 

A'Rl  T  F  (  6  #  2  0  )  T,  AMS 

.n  W  I  T  F  (6/t>ft)  FLO  A  f  ( TIMER) /| 00.0 

THIS  IS  T  HE  STRAIGHT  FORTRAN  CODE  t'EPSlJJ  .*■  ( T  H  Cl'^vu-i 

At?  I  TF  (6,  SO) 

C"UM1  =  T 1  af  S 
TOTAL  =  0 \  fc 

I  F  (.101.  L  T  ii  T I  M  I  T  T  T  'Hf.R  ( H  AMPLE  )  1  C  Al  L  ERR 

PO  'HI  ILF  (COt.NI  .GT.  0  ) 

COUIT  =  C0UM1  -  I 
VAL  IE  =  TER3 
TOTAL  =  0\!E 

DO  -f  •  ( T  L  -  (  v  A  L 1 1 E  .GT.  0) 

TOTAL  =  TOTAL  *  VALUE 

V  AL  )E  =  V  Al  uE  -  1 

t  Ml i  00 
E'lD  !>•» 

IF  (.'Of.  LTHySTATT  T  v-Phf  (  rvFR,HAMDLF  )  )  CALL  F<-’R 

.>•?!  Tt  (6,r?0)  I  E  M  3 ,  TOTAL 

>«R|  IF(#»rhft)  FLOAT  (  I  t*FR)/1  00.0 


C  THIS  PART  IS  A  S J890UT I NE  CALL  IN  FORTRAN  a'ITH  CORDON 

COUNT  s  TIMES 
TOTAL  =  ONE 

*RITECb,S0) 

IF  (.NOT.  LIHTINITTIMER(HANOLF)  )  CALL  ERR 
CALL  FAC 

IF  (.NOT.  Llb?STAniMER(2,TIMER,HAN:>LE))  CALL  fcWR 

*RITF(b,?0)  IEMP,  TOTAL 

rtRI TE(6,60)  FLOAT (TIMER) /l 00.0 

C  THIS  PART  USES  JRS  HLL  aITm  COMMON 

I  DI  AL  =  0NIE 
COUNT  =  TIMES 

ARITF.  (6,<i0) 

IF  (.■JOT.  LlttHNin  IMFR(HAi^OLE)  )  CALL  err 

CALL  XFCCdOIAL,  !RfcT,  INST) 

IF  (.MOT.  LleVSTAT  I  IMER(£,TlMEk,MAW.UF)  )  CALL  k  >R 

*RirE(6#?i»>  T  E  M  3  ,  TOTAL 

MR  |  T  F  T  6 , SO )  FLOAT  (  IT  -1ER)/ 100.0 

E  -!0 

:  tmf  factorial  sosKoor  i>'E  in  for iao 
S  I H R 0  J T  T N E  fa: 

r  TEGER  TOTAL,  VAl.l'E,  IEsP,  C»OmT,  OLE 

CMVfrj  /  n  C  3  /  TOTAL,  VALtE,  TE  MR ,  CO'^T,  ZERO,  (.  't 

01  *.01  Lr  (COUNT  . GI  .  /E-?0  1 

CUJ.viT  s  COJNT  -  1 

VALUE  =  Tt^P 
TOTAL  =  ORE 

id  JIM  CLP  (VALU  F  .GT.  7  FRO) 


ACTORIAL  PROGRAM  IN  JPS  HLL  \ 
program  factorial; 

INTEGER  TOTAL,  VALUE,  TEMP,  COUNT 


00  4HILE  (COUNT  ,c.r.  ZERO); 

COUNT  2  CO  JNT  -  l; 

VALUE  =  TEMP; 

total  2  i; 

00  WHILE  (VALUT  .GT.  ZERO); 
TOTAL  2  TOTAL  *  VALUE; 
VALUE  2  VALUE  -  i; 


u  u 


SUBROUTINE  ERR  IS  JSEO  FOR  SIGNALING  ERRORS  FROM 
THt  TIMING  MECHANISM. 

SUBROUTINE  ERR 
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